M.I.T.  LIBRARIES  -  DEWEY 


Digitized  by  the  Internet  Archive 

in  2011  with  funding  from 

Boston  Library  Consortium  Member  Libraries 


http://www.archive.org/details/empiricalstrategOOangr 


tmuu 


ReV 


working  paper 
department 
of  economics 


EMPIRICAL  STRATEGIES  IN  LABOR  ECONOMICS 

Joshua  D.  Angrist 
Alan  B.  Krueger 


October  1998 


massachusetts 

institute  of 

technology 

50  memorial  drive 
Cambridge,  mass.  02139 


WORKING  PAPER 
DEPARTMENT 
OF  ECONOMICS 


EMPIRICAL  STRATEGIES  IN  LABOR  ECONOMICS 

Joshua  D.  Angrist 
Alan  B.  Krueger 

No.  98-07Rev.  October  1998 


MASSACHUSETTS 

INSTITUTE  OF 

TECHNOLOGY 

50  MEMORIAL  DRIVE 
CAMBRIDGE,  MASS.  02142 


rmssg:  ;f^E 


Tdec 


DEC  1  7  1998 

LIBRARIES 


C:\. .  .  \projects\handbook\transfer\chapterl098.wpd    October  27,  1998 


Empirical  Strategies  in  Labor  Economics 

Joshua  D.  Angrist 
MIT  and  NBER 

and 

Alan  B.  Krueger 
Princeton  University  and  NBER 


October  1998 


*We  thank  Eric  Bettinger,  Lucia  Breierova,  Kristen  Harknett,  Aaron  Siskind,  Diane  Whitmore,  Eric  Wang, 
and  Steve  Wu  for  research  assistance.  For  helpful  comments  and  discussions  we  thank  Alberto  Abadie, 
Daron  Acemoglu,  Jere  Behrman,  David  Card,  Angus  Deaton,  Jeff  Kling,  Guido  Imbens,  Chns  Mazingo, 

Steve  Pischke,  and  Cecilia  Rouse.  Of  course,  errors  and  omissions  are  solely  the  work  of  the  authors.  This 
paper  was  prepared  for  the  Handbook  of  Labor  Economics. 


EMPIRICAL  STRATEGIES  IN  LABOR  ECONOMICS 
JOSHUA  D.  ANGR1ST  AND  ALAN  B.  KRUEGER 
Massachusetts  Institute  of  Technology  and  Princeton  University 

Contents 

1 .  Introduction 

2.  Identification  strategies  for  causal  relationships 

2.1  The  range  of  causal  questions 

2.2  Identification  in  regression  models 

2.2.1  Control  for  confounding  variables 

2.2.2  Fixed-effects  and  differences-in-differences 

2.2.3  Instrumental  variables 

2.2.4  Regression-discontinuity  designs 

2.3  Consequences  of  heterogeneity  and  nonlinearity 

2.3.1  Regression  and  the  conditional  expectation  function 

2.3.2  Matching  instead  of  regression 

2.3.3  Matching  using  the  propensity  score 

2.3.4  Interpreting  instrumental  variables  estimates 

2.4  Refutability 

3.  Data  collection  strategies 

3.1  Secondary  sources 

3.2  Primary  data  collection  and  survey  methods 

3.3  Administrative  data  and  record  linkage 

3.4  Combining  samples 

4.  Measurement  issues 

4.1  Measurement  error  models 

4.2  The  extent  of  measurement  error  in  labor  data 

4.3  Weighting  and  allocated  values 

5.  Summary 
Appendix 
References 


ABSTRACT 


Empirical  Strategies  in  Labor  Economics 


This  chapter  provides  an  overview  of  the  methodological  and  practical  issues  that  arise  when  estimating  causal 
relationships  that  are  of  interest  to  labor  economists.  The  subject  matter  includes  identification,  data  collection, 
and  measurement  problems.  Four  identification  strategies  are  discussed,  and  five  empirical  examples  -  the 
effects  of  schooling,  unions,  immigration,  military  service  and  class  size  -  illustrate  the  methodological  points. 
In  discussing  each  example,  we  adopt  an  experimentalist  perspective  that  draws  a  clear  distinction  between 
variables  that  have  causal  effects,  control  variables,  and  outcome  variables.  The  chapter  also  discusses 
secondary  data  sets,  primary  data  collection  strategies,  and  administrative  data.  The  section  on  measurement 
issues  focuses  on  recent  empirical  examples,  presents  a  summary  of  empirical  findings  on  the  reliability  of  key 
labor  market  data,  and  briefly  reviews  the  role  of  survey  sampling  weights  and  the  allocation  of  missing  values 
in  empirical  research. 


JEL  Numbers:  J00,  J31,  CIO,  C81 


1.  Introduction 

Empirical  analysis  is  more  common  and  relies  on  more  diverse  sources  of  data  in  labor  economics  than 
in  economics  more  generally.  Table  1,  which  updates  Stafford's  (1986,  Table  7.2)  survey  of  research  in  labor 
economics,  bears  out  this  claim.  Indeed,  almost  80%  of  recent  articles  published  in  labor  economics  contain 
some  empirical  work,  and  a  striking  two-thirds  analyzed  micro  data.  In  the  1970s,  micro  data  became  more 
common  in  studies  of  the  labor  market  than  time-series  data,  and  by  the  mid-90s  the  use  of  micro  data 
outnumbered  time-series  data  by  a  factor  of  over  ten  to  one.  The  use  of  micro  and  time-series  data  is  more 
evenly  split  in  other  fields  of  economics. 

In  addition  to  using  micro  data  more  often,  labor  economists  have  come  to  rely  on  a  wider  range  of 

data  sets  than  other  economists.  The  fraction  of  published  papers  using  data  other  than  what  is  in  standard 

public-use  files  reached  38%  percent  in  the  period  from  1994  to  1997.  The  files  in  the  "all  other  micro  data 

sets"  category  in  Table  1  include  primary  data  sets  collected  by  individual  researchers,  customized  public  use 

files,  administrative  records,  and  administrative-survey  links.  This  is  noteworthy  because  about  ten  years  ago, 

in  his  Handbook  of  Econometrics  survey  of  economic  data  issues,  Griliches  (1986,  p.  1466)  observed: 

". . .  since  it  is  the  'badness'  of  the  data  that  provides  us  with  our  living,  perhaps  it  is  not  at 
all  surprising  that  we  have  shown  little  interest  in  improving  it,  in  getting  involved  in  the 
grubby  task  of  designing  and  collecting  original  data  sets  of  our  own." 

The  growing  list  of  papers  involving  some  sort  of  original  data  collection  suggests  this  situation  may  be 

changing;  examples  include  Freeman  and  Hall  (1986),  Ashenfelter  and  Krueger  (1994),  Anderson  and  Meyer 

(1994),  Card  and  Krueger  (1994,  1998),  Dominitz  and  Manski  (1997),  Imbens,  Rubin  and  Sacerdote  (1997), 

and  Angrist(  1998). 

Labor  economics  has  also  come  to  be  distinguished  by  the  use  of  cutting  edge  econometric  and 

statistical  methods.  This  claim  is  supported  by  the  observation  that  outside  of  time-series  econometrics,  many 

and  perhaps  most  innovations  in  econometric  technique  and  style  since  the  1970s  were  largely  motivated  by 

research  on  labor-related  topics.  These  innovations  include  sample  selection  models,  nonparametric  methods 

for  censored  data  and  survival  analysis,  quantile  regression,  and  the  renewed  interest  in  statistical  and 
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identification  problems  related  to  instrumental  variables  estimators  and  quasi -experimental  methods. 

What  do  labor  economists  do  with  all  the  data  they  analyze?  A  broad  distinction  can  be  made  between 
two  types  of  empirical  research  in  labor  economics:  descriptive  analysis  and  causal  inference.  Descriptive 
analysis  can  establish  facts  about  the  labor  market  that  need  to  be  explained  by  theoretical  reasoning  and  yield 
new  insights  into  economic  trends.  The  importance  of  ostensibly  mundane  descriptive  analysis  can  be  captured 
by  Sherlock  Holmes's  admonition  that:  "It  is  a  capital  offense  to  theorize  before  all  the  facts  are  in."  A  great 
deal  of  important  research  falls  under  the  descriptive  heading,  including  work  on  trends  in  poverty  rates,  labor 
force  participation,  and  wage  levels.  A  good  example  of  descriptive  research  of  major  importance  is  the  work 
documenting  the  increase  in  wage  dispersion  in  the  1980s  (see  e.g..  Levy,  1987,  Murphy  and  Welch,  1992; 
Katz  and  Murphy,  1992;  Juhn,  Murphy,  and  Pierce,  1993).  This  research  has  inspired  a  vigorous  search  for 
the  causes  of  changes  in  the  wage  distribution. 

In  contrast  with  descriptive  analysis,  causal  inference  research  seeks  to  determine  the  effects  of 
particular  interventions  or  policies,  or  to  estimate  features  of  the  behavioral  relationships  suggested  by 
economic  theory.  Causal  inference  and  descriptive  analysis  are  not  competing  methods;  indeed,  they  are  often 
complementary.  In  the  example  mentioned  above,  compelling  evidence  that  wage  dispersion  increased  in  the 
1980s  inspired  a  search  for  causes  of  these  changes.  Causal  inference  is  often  more  difficult  than  descriptive 
analysis,  and  consequently  more  controversial. 

Most  labor  economists  seem  to  share  a  common  view  of  the  importance  of  descriptive  research,  but 
there  are  differences  in  views  regarding  the  role  economic  theory  can  or  should  play  in  causal  modeling.  This 
division  is  illustrated  by  the  debate  over  social  experimentation  (Burtless,  1995;  Heckman  and  Smith,  1995), 
in  contrasting  approaches  to  studying  the  impact  of  immigration  on  the  earnings  of  natives  (Card,  1990;  Borjas, 
Freeman  and  Katz,  1997),  and  in  recent  symposia  illustrating  alternative  research  styles  (Angrist,  1995a;  Keane 
and  Wolpin,  1 997).  Research  in  a  structuralist  style  relies  heavily  on  economic  theory  to  guide  empirical  work 
or  to  make  predictions.  Keane  and  Wolpin  (1 997,  p.  Ill)  describe  structural  work  as  trying  to  do  one  of  two 
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things:  (a)  recover  the  primitives  of  economic  theory  (parameters  determining  preferences  and  technology); 
(b)  estimate  decision  rules  derived  from  economic  models.  Given  success  in  either  of  these  endeavors,  it  is 
usually  clear  how  to  make  causal  statements  and  to  generalize  from  the  specific  relationships  and  populations 
studied  in  any  particular  application. 

An  alternative  to  structural  modeling,  often  called  the  quasi-experimental  or  simply  the 
"experimentalist"  approach,  also  uses  economic  theory  to  frame  causal  questions.  But  this  approach  puts  front 
and  center  the  problem  of  identifying  the  causal  effects  from  specific  events  or  situations.  The  problem  of 
generalization  of  findings  is  often  left  to  be  tackled  later,  perhaps  with  the  aid  of  economic  theory  or  informal 
reasoning.  Often  this  process  involves  the  analysis  of  additional  quasi-experiments,  as  in  recent  work  on  the 
returns  to  schooling  (see,  e.g.,  the  papers  surveyed  by  Card  in  this  volume).  In  his  methodological  survey, 
Meyer  (1995)  describes  quasi-experimental  research  as  "an  outburst  of  work  in  economics  that  adopts  the 
language  and  conceptual  framework  of  randomized  experiments."  Here,  the  ideal  research  design  is  explicitly 
taken  to  be  a  randomized  trial  and  the  observational  study  is  offered  as  an  attempt  to  approximate  the  force  of 
evidence  generated  by  an  actual  experiment. 

In  either  a  structural  or  quasi-experimental  framework,  the  researcher's  task  is  to  estimate  features  of 
the  causa!  relationships  of  interest.  This  chapter  focuses  on  the  empirical  strategies  commonly  used  to 
estimate  features  of  the  causal  relationships  that  are  of  interest  to  labor  economists.  The  chapter  provides  an 
overview  of  the  methodological  and  practical  issues  that  arise  in  implementing  an  empirical  strategy.  We  use 
the  term  empirical  strategy  broadly,  beginning  with  the  statement  of  a  causal  question,  and  extending  to 
identification  strategies  and  econometric  methods,  selection  of  data  sources,  measurement  issues,  and 
sensitivity  tests.  The  choice  of  topics  was  guided  by  our  own  experiences  as  empirical  researchers  and  our 
research  interests.  As  far  as  econometric  methods  go,  however,  our  overview  is  especially  selective;  for  the 
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most  part  we  ignore  structural  modeling  since  that  topic  is  well  covered  elsewhere.1  Of  course,  there  is 
considerable  overlap  between  structural  and  quasi-experimental  approaches  to  causal  modeling,  especially 
when  it  comes  to  data  and  measurement  issues.  The  difference  is  primarily  one  of  emphasis,  because  structural 
modeling  generally  relies  on  assumptions  about  exogenous  variability  in  certain  variables  and  quasi- 
experimental  analyses  require  some  theoretical  assumptions. 

The  attention  we  devote  to  quasi-experimental  methods  is  also  motivated  by  skepticism  about  the 
credibility  of  empirical  research  in  economics.  For  example,  in  a  critique  of  the  practice  of  modem 
econometrics,  Lester  Thurow  (1983,  pp.  106-107)  argued: 

"Economic  theory  almost  never  specifies  what  secondary  variables  (other  than  the  primary 
ones  under  investigation)  should  be  held  constant  in  order  to  isolate  the  primary  effects.  ... 
When  we  look  at  the  impact  of  education  on  individual  earnings,  what  else  should  be  held 
constant:  IQ,  work  effort,  occupational  choice,  family  background?  Economic  theory  does 
not  say.  Yet  the  coefficients  of  the  primary  variables  almost  always  depend  on  precisely  what 
other  variables  are  entered  in  the  equation  to  "hold  everything  else  constant." 

This  view  of  applied  research  strikes  us  as  being  overly  pessimistic,  but  we  agree  with  the  focus  on  omitted 
variables.  In  labor  economics,  at  least,  the  current  popularity  of  quasi-experiments  stems  precisely  from  this 
concern:  because  it  is  typically  impossible  to  control  adequately  for  all  relevant  variables,  it  is  often  desirable 
to  seek  situations  where  one  has  a  reasonable  presumption  that  the  omitted  variables  are  uncorrelated  with  the 
variables  of  interest.  Such  situations  may  arise  if  the  researcher  can  use  random  assignment,  or  if  the  forces 
of  nature  or  human  institutions  provide  something  close  to  random  assignment. 

The  next  section  reviews  four  identification  strategies  that  are  commonly  used  to  answer  causal 
questions  in  contemporary  labor  economics.  Five  empirical  examples  --  the  effects  of  schooling,  unions, 
immigration,  military  service,  and  class  size  -  illustrate  the  methodological  points  throughout  the  chapter.  In 
keeping  with  our  experimentalist  perspective,  we  attempt  to  draw  clear  distinctions  between  variables  that  have 


'See,  for  example,  Heckman  and  MaCurdy's  (1986)  Handbook  of  Econometrics  chapter,  which  "outlines  the 
econometric  framework  developed  by  labor  economists  who  have  built  theoretically  motivated  models  to  explain  the 
new  data."  (p.  1918).  We  also  have  little  to  say  about  descriptive  analysis  because  descriptive  statistics  are 
commonly  discussed  in  statistics  courses  and  books  (see,  e.g.,  Tufte,  1992,  or  Tukey,  1977). 


5 
causal  effects,  control  variables,  and  outcome  variables  in  each  example. 

In  Section  3  we  turn  to  a  discussion  of  secondary  data  sets  and  primary  data  collection  strategies.  The 
focus  here  is  on  data  for  the  United  States.2  Section  3  also  offers  a  brief  review  of  issues  that  arise  when 
conducting  an  original  survey  and  suggestions  for  assembling  administrative  data  sets.  Because  existing 
public-use  data  sets  have  already  been  extensively  analyzed,  primary  data  collection  is  likely  to  be  a  growth 
industry  for  labor  economists  in  the  future.  Following  the  discussion  of  data  sets,  Section  4  discusses 
measurement  issues,  including  a  brief  review  of  classical  models  for  measurement  error  and  some  extensions. 
Since  most  of  this  theoretical  material  is  covered  elsewhere,  including  the  Griliches  (1986)  chapter  mentioned 
previously,  our  focus  is  on  recent  empirical  examples.  This  section  also  presents  a  summary  of  empirical 
findings  on  the  reliability  of  labor  market  data,  and  reviews  the  role  of  survey  sampling  weights  and  the 
allocation  of  missing  values  in  empirical  research. 


2.  Identification  strategies  for  causal  relationships 


The  object  of  science  is  the  discovery  of  relations  ...  of  which 
the  complex  may  be  deduced  from  the  simple. 


John  Pringle  Nichol,  1840  (quoted  in  Lord  Kelvin's  class  notes). 
2.1  The  range  of  causal  questions 

The  most  challenging  empirical  questions  in  economics  involve  "what  if  statements  about 

counterfactual  outcomes.   Classic  examples  of  "what  if  questions  in  labor  market  research  concern  the  effects 

of  career  decisions  like  college  attendance,  union  membership,  and  military  service.   Interest  in  these  questions 

is  motivated  by  immediate  policy  concerns,  theoretical  considerations,  and  problems  facing  individual  decision 

makers.  For  example,  policy  makers  would  like  to  know  whether  military  cutbacks  will  reduce  the  earnings 


'Overviews  of  data  sources  for  developing  countries  appear  in  Deaton's  (1995)  chapter  in  The  Handbook  of 
Development  Economics,  Grosh  and  Glewwe  (1996,  1998),  and  Kremer  (1997).  We  are  not  aware  of  a 
comprehensive  survey  of  micro  data  sets  for  labor  market  research  in  Europe,  though  a  few  sources  and  studies  are 
referenced  in  Westergard-Nielson  (1989). 


6 

of  minority  men  who  have  traditionally  seen  military  service  as  a  major  career  opportunity.  Additionally,  many 
new  high  school  graduates  would  like  to  know  what  the  consequences  of  serving  in  the  military  are  likely  to 
be  for  them.  Finally,  the  theory  of  on-the-job  training  generates  predictions  about  the  relationship  between 
time  spent  serving  in  the  military  and  civilian  earnings. 

Regardless  of  the  motivation  for  studying  the  effects  of  career  decisions,  the  causal  relationships  at 
the  heart  of  these  questions  involve  comparisons  of  counterfactual  states  of  the  world.  Someone  -  the 
government,  an  individual  decision  maker,  or  an  academic  economist  -  would  like  to  know  what  outcomes 
would  have  been  observed  if  a  variable  were  manipulated  or  changed  in  some  way.  Lewis's  (1986)  study  of 
the  effects  of  union  wage  effects  gives  a  concise  description  of  this  type  of  inference  problem  (p. 2):  "At  any 
given  date  and  set  of  working  conditions,  there  is  for  each  worker  a  pair  of  wage  figures,  one  for  unionized 
status  and  the  other  for  nonunion  status".  Differences  in  these  two  potential  outcomes  define  the  causal  effects 
of  interest  in  Lewis's  work,  which  uses  regression  to  estimate  the  average  gap  between  them.3  At  first 
glance,  the  idea  of  unobserved  potential  outcomes  seems  straightforward,  but  in  practice  it  is  not  always  clear 
exactly  how  to  define  a  counterfactual  world.  In  the  case  of  union  status,  for  example,  the  counterfactual  is 
likely  to  be  ambiguous.  Is  the  effect  defined  relative  to  a  world  where  unionization  rates  are  what  they  are 
now,  a  world  where  everyone  is  unionized,  a  world  where  everyone  in  the  worker's  firm  or  industry  is 
unionized,  or  a  world  where  no  one  is  unionized?  Simple  micro-economic  analysis  suggests  that  the  answers 
to  these  questions  differ.  This  point  is  at  the  heart  of  Lewis's  (1986)  distinction  between  union  wage  gaps, 
which  refers  to  causal  effects  on  individuals,  and  wage  gains,  which  refers  to  comparisons  of  equilibria  in  a 
world  with  and  without  unions.  In  practice,  however,  the  problem  of  ambiguous  counterfactuals  is  typically 
resolved  by  focusing  on  the  consequences  of  hypothetical  manipulations  in  the  world  as  is,  i.e.,  assuming  there 


3See  also  Rubin  (1974,  1977)  and  Holland  (1986)  for  formal  discussions  of  counterfactual  outcomes  in  causal 
research. 
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are  no  general  equilibrium  effects.4 

Even  if  ambiguities  in  the  definition  of  counterfactual  states  can  be  resolved,  it  is  still  difficult  to  leam 
about  differences  in  counterfactual  outcomes  because  the  outcome  of  one  scenario  is  all  that  is  ever  observed 
for  any  one  unit  of  observation  (e.g.,  a  person,  State,  or  firm).  Given  this  basic  difficulty,  how  do  researchers 
learn  about  counterfactual  states  of  the  world  in  practice?  In  many  fields,  and  especially  in  medical  research, 
the  prevailing  view  is  that  the  best  evidence  about  counterfactuals  is  generated  by  randomized  trials  because 
randomization  ensures  that  outcomes  in  the  control  group  really  do  capture  the  counterfactual  for  a  treatment 
group.  Thus,  Federal  guidelines  for  a  new  drug  application  require  that  efficacy  and  safety  be  assessed  by 
randomly  assigning  the  drug  being  studied  or  a  placebo  to  treatment  and  control  groups  (Center  for  Drug 
Evaluation  and  Research,  1988).  Learner  (1982)  suggested  that  the  absence  of  randomization  is  the  main 
reason  why  econometric  research  often  appears  less  convincing  than  research  in  other  more  experimental 
sciences.  Randomized  trials  are  certainly  rarer  in  economics  than  in  medical  research,  but  labor  economists 
are  increasingly  likely  to  use  randomization  to  study  the  effects  of  labor  market  interventions  (Passell,  1992). 
In  fact,  a  recent  survey  of  economists  by  Fuchs,  Krueger,  and  Poterba  (1998)  finds  that  most  labor  economists 
place  more  credence  in  studies  of  the  effect  of  government  training  programs  on  participants'  income  if  the 
research  design  entails  random  assignment  than  if  the  research  design  is  based  on  structural  modeling. 

Unfortunately,  economists  rarely  have  the  opportunity  to  randomize  variables  like  educational 
attainment,  immigration,  or  minimum  wages.  Empirical  researchers  must  therefore  rely  on  observational 
studies  that  typically  fail  to  generate  the  same  force  of  evidence  as  a  randomized  experiment.  But  the  object 
of  an  observational  study,  like  an  experimental  study,  can  still  be  to  make  comparisons  that  provide  evidence 
about  causal  effects.  Observational  studies  attempt  to  accomplish  this  by  controlling  for  observable  differences 
between  comparison  groups  using  regression  or  matching  techniques,  using  pre-post  comparisons  on  the  same 


'Lewis's  (1963)  earlier  book  discussed  causal  effects  in  terms  of  industries  and  sectors,  and  made  a  distinction 
between  "direct"  and  "indirect"  effects  of  unions  similar  to  the  distinction  between  wage  gaps  and  wage  gains. 
Heckman,  Lochner,  and  Taber  (1998)  discuss  general  equilibrium  effects  that  arise  in  the  evaluation  of  college 
tuition  subsidies. 
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units  of  observation  to  reduce  bias  from  unobserved  differences,  and  by  using  instrumental  variables  as  a 
source  of  quasi-experimental  variation.  Randomized  trials  form  a  conceptual  benchmark  for  assessing  the 
success  or  failure  of  observational  study  designs  that  make  use  of  these  ideas,  even  when  it  is  clear  that  it  may 
be  impossible  or  at  least  impractical  to  study  some  questions  using  random  assignment.  In  almost  every 
observational  study,  it  makes  sense  to  ask  whether  the  research  design  is  a  good  "natural  experiment."5 

A  sampling  of  causal  questions  that  economists  have  studied  without  benefit  of  a  randomized 
experiment  appears  in  Table  2,  which  characterizes  a  few  observational  studies  grouped  according  to  the 
source  of  variation  used  to  make  causal  inferences  about  a  single  "causing  variable."  The  distinction  between 
causing  variables  and  control  variables  in  Table  2  is  one  difference  between  the  discussion  in  this  chapter  and 
traditional  econometric  texts,  which  tend  to  treat  all  variables  symmetrically.  The  combination  of  a  clearly 
labeled  source  of  identifying  variation  in  a  causal  variable  and  the  use  of  a  particular  econometric  technique 
to  exploit  this  information  is  what  we  call  an  identification  strategy.  Studies  were  selected  for  Table  2 
primarily  because  the  source  or  type  of  variation  that  is  being  used  to  make  causal  statements  is  clearly  labeled. 
The  four  approaches  to  identification  described  in  the  table  are:  Control  for  Confounding  Variables,  Fixed- 
effects  and  Differences-in-differences,  Instrumental  Variables,  and  Regression  Discontinuity  methods.  This 
taxonomy  provides  an  outline  for  the  next  section. 

2.2.  Identification  in  regression  models 
2.2.1  Control  for  confounding  variables 

Labor  economists  have  long  been  concerned  with  the  question  of  whether  the  observed  positive 
association  between  schooling  and  earnings  is  a  causal  relationship.  This  question  originates  partly  in  the 


5This  point  is  also  made  by  Freeman  (1989).  The  notion  that  experimentation  is  an  ideal  research  design  for 
Economics  goes  back  at  least  to  the  Cowles  Commission.  See,  for  example,  Girshick  and  Haavelmo  (1947),  who 
wrote  (p.  79):  "In  economic  theory  ...  the  total  demand  for  the  commodity  may  be  considered  a  function  of  all 
prices  and  of  total  disposable  income  of  all  consumers.  The  ideal  method  of  verifying  this  hypothesis  and  obtaining 
a  picture  of  the  demand  function  involved  would  be  to  conduct  a  large-scale  experiment,  imposing  alternative  prices 
and  levels  of  income  on  the  consumers  and  studying  their  reactions." 
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observation  that  people  with  more  schooling  appear  to  have  other  characteristics,  such  as  wealthier  parents, 
that  are  also  associated  with  higher  earnings.  Also,  the  theory  of  human  capital  identifies  unobserved  earnings 
potential  or  "ability"  as  one  of  the  principal  determinants  of  educational  attainment  (see,  e.g,  Willis  and  Rosen, 
1979).  The  most  common  identification  strategy  in  research  on  schooling  (and  in  economics  in  general) 
attempts  to  reduce  bias  in  naive  comparisons  by  using  regression  to  control  for  variables  that  are  confounded 
with  (i.e.,  related  to)  schooling.  The  typical  estimating  equation  in  this  context  is, 
(1)  Y,=  'XI'Pr+pISl  +  «i. 

where  Y(  is  person  i's  log  wage  or  earnings,  Xjis  a  kxl  vector  of  control  variables,  including  measures  of 
ability  and  family  background,  Sj  is  years  of  educational  attainment,  and  e,  is  the  regression  error.  The  vector 
of  population  parameters  is  [0/  pr]'.  The  "r"  subscript  on  the  parameters  signifies  that  these  are  regression 
coefficients.  The  question  of  causality  concerns  the  interpretation  of  these  coefficients.  For  example,  they  can 
always  be  viewed  as  providing  the  best  (i.e.,  minimum-mean-squared-error)  linear  predictor  of  Y(.6  The  best 
linear  predictor  need  not  have  causal  or  behavioral  significance;  the  resulting  residual  is  uncorrected  with  the 
regressors  simply  because  the  first-order  conditions  for  the  prediction  problem  are  Ej>;Xj]=0  and  EfoS^O. 

Regression  estimates  from  five  early  studies  of  the  relationship  between  schooling,  ability,  and 
earnings  are  summarized  in  Table  3.  The  first  row  reports  estimates  without  ability  controls  while  the  second 
row  reports  estimates  that  include  some  kind  of  test  score  in  the  X-vector  as  a  control  for  ability.  Information 
about  the  X-variables  is  given  in  the  rows  labeled  "ability  variable"  and  "other  controls".  The  first  two  studies, 
Ashenfelter  and  Mooney  (1968)  and  Hansen,  Weisbrod,  and  Scanlon  (1970)  use  data  on  individuals  at  the 
extremes  of  the  ability  distribution  (graduate  students  and  military  rejects),  while  the  others  use  more 
representative  samples.  Results  from  the  last  two  studies,  Griliches  and  Mason  (1972)  and  Chamberlain 
(1978),  are  reported  for  models  with  and  without  family  background  controls. 

The  schooling  coefficients  in  Table  3  are  smaller  than  the  coefficient  estimates  we  are  used  to  seeing 


The  best  linear  predictor  is  the  solution  to  Minb.c  E[(Y,  -X/b  -cS(  )2].  See,  e.g.,  White  (1980),  or  Goldberger  (1991). 
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in  studies  using  more  recent  data  (see,  e.g.,  Card's  survey  in  this  volume).  This  is  partly  because  the 
association  between  earnings  and  schooling  has  increased,  partly  because  the  samples  used  in  the  papers 
summarized  in  the  table  include  only  young  men,  and  partly  because  the  models  used  for  estimation  control 
for  age  and  not  potential  experience  (age-education-6).  The  latter  parameterization  leads  to  larger  coefficient 
estimates  since,  in  a  linear  model,  the  schooling  coefficient  controlling  for  age  is  equal  to  the  schooling 
coefficient  controlling  for  experience  minus  the  experience  coefficient.  The  only  specification  in  Table  2  that 
controls  for  potential  experience  is  from  Griliches  (1977),  which  also  generates  the  highest  estimate  in  the  table 
(.065).  The  corresponding  estimate  controlling  for  age  is  .022.  The  table  also  shows  that  controlling  for  ability 
and  family  background  generally  reduces  the  magnitude  of  schooling  coefficients,  implying  that  at  least  some 
of  the  association  between  earnings  and  schooling  in  these  studies  can  be  attributed  to  variables  other  than 
schooling. 

What  conditions  must  be  met  for  regression  estimates  like  those  in  Table  3  to  have  a  causal 
interpretation?  In  this  case,  causality  can  be  based  on  an  underlying  functional  relationship  that  describes 
what  a  given  individual  would  earn  if  he  or  she  obtained  different  levels  of  education.  This  relationship  may 
be  person-specific,  so  we  write 
(2)  Ya  -  US) 

to  denote  the  potential  (or  latent)  earnings  that  person  i  would  receive  after  obtaining  S  years  of  education. 
Note  that  the  function  fs(5)  has  an  "i"  subscript  on  it  while  S  does  not.  This  highlights  the  fact  that  although 
S  is  a  variable,  it  is  not  a  random  variable.  The  function  f^S)  tells  us  what  i  would  earn  for  any  value  of 
schooling,  5,  and  not  just  for  the  realized  value,  Sj.  In  other  words,  f{(S)  answers  "what  if  questions.  In  the 
context  of  theoretical  models  of  the  relationship  between  human  capital  and  earnings,  the  form  of  f;(5)  may 
be  determined  by  aspects  of  individual  behavior  and/or  market  forces.  With  or  without  an  explicit  economic 
model  for  f((5),  however,  we  can  think  of  this  function  as  describing  the  earnings  level  of  individual  i  if  that 
person  were  assigned  schooling  level  S  (e.g.,  in  an  experiment). 
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Once  the  causal  relationship  of  interest,  f^S),  has  been  defined,  it  can  be  linked  to  the  observed 
association  between  schooling  and  earnings.  A  convenient  way  to  do  this  is  with  a  linear  model: 

(3)  fi(S)=PotP5  + Hi- 
Iii  addition  to  being  linear,  this  equation  says  that  the  functional  relationship  of  interest  is  the  same  for  all 
individuals.  Again,  S  is  written  without  a  subscript,  because  equation  (3)  tells  us  what  person  i  would  earn 
for  any  value  of  S  and  not  just  the  realized  value,  S;.  The  only  individual-specific  and  random  part  of  f;(S)  is 
a  mean-zero  error  component,  r\it  which  captures  unobserved  factors  that  determine  earnings.  In  practice, 
regression  estimates  have  a  causal  interpretation  under  weaker  functional-form  assumptions  than  this  but  we 
postpone  a  detailed  discussion  of  this  point  until  Section  2.3.  Note  that  the  earnings  of  someone  with  no 
schooling  at  all  is  just  po  +  r^  in  this  model. 

Substituting  the  observed  value  S,  for  S  in  equation  (3),  we  have 

(4)  Y^Po  +  pS.  +  r,, 

This  looks  like  equation  (1)  without  covariates,  except  that  equation  (4)  explicitly  associates  the  regression 
coefficients  with  a  causal  relationship.    The  OLS  estimate  of  p  in  equation  (4)  has  probability  limit 

(5)  C(YS,  SJ/VCSj)  =  p  +  C(Si,  riJ/VCSi). 

The  term  C(S;,  nJA^Sj)  is  the  coefficient  from  a  regression  of  rjj  on  Sj,  and  reflects  any  correlation  between 
the  realized  Ss  and  unobserved  individual  earnings  potential,  which  in  this  case  is  the  same  as  correlation  with 
Tjj.  If  educational  attainment  were  randomly  assigned,  as  in  an  experiment,  then  we  would  have  C(S(,  r|i)=0 
in  the  linear  model.  In  practice,  however,  schooling  is  a  consequence  of  individual  decisions  and  institutional 
forces  that  are  likely  to  generate  correlation  between  r)j  and  schooling.  Consequently,  it  is  not  automatic  that 
OLS  provides  a  consistent  estimate  of  the  parameter  of  interest.7 

Regression  strategies  attempt  to  overcome  this  correlation  in  a  very  simple  way:  in  addition  to  the 


'Econometric  textbooks  (e.g.,  Pindyck  and  Rubinfeld,  1991)  sometimes  refer  to  regression  models  for  causal 
relationships  as  "true  models,"  but  this  seems  like  potentially  misleading  terminology  since  non-behavioral 
descriptive  regressions  could  also  be  described  as  being  "true". 


12 
functional  form  assumption  for  potential  outcomes  embodied  in  (3),  the  random  part  of  individual  earnings 
potential,  r^,  is  decomposed  into  a  linear  function  of  the  k  observable  characteristics,  X(,  and  an  error  term,  €;, 
(6a)  tli  =  Xi'P  +  €i, 

where  P  is  a  vector  of  population  regression  coefficients.   This  means  that  e,  and  Xj  are  uncorrected  by 
construction.  The  key  identifying  assumption  is  that  the  observable  characteristics,  Xj,  are  the  only  reason  why 
r)i  and  Sj  (equivalently,  f,(5)  and  S;)  are  correlated,  so 
(6b)  E[S,ei]=0. 

This  is  the  "selection  on  observables"  assumption  discussed  by  Barnow,  Cain,  and  Goldberger  (1981),  where 
the  regressor  of  interest  is  assumed  to  be  determined  independently  of  potential  outcomes  after  accounting  for 
a  set  of  observable  characteristics. 

Continuing  to  maintain  the  selection-on-observables  assumption,  a  consequence  of  (6a)  and  (6b)  is 
that 

(7)  C(Yi,Si)/V(Si)  =  p  +  4>Sx'P, 

where  4>sx  is  a  kxl  vector  coefficients  from  a  regression  of  each  element  of  X(  on  Sj.  Equation  (7)  is  the  well- 
known  "omitted  variables  bias"  formula,  which  relates  a  bivariate  regression  coefficient  to  the  coefficient  on 
S,  in  a  regression  that  includes  additional  covariates.  If  the  omitted  variables  are  positively  related  to  earnings 
(P>0)  and  positively  correlated  with  schooling  (<J>SX>0),  then  C(Yjf  S^/VCS,)  is  larger  than  the  causal  effect  of 
schooling,  p.  A  second  consequence  of  (6a)  and  (6b)  is  that  the  OLS  estimate  of  pr  in  equation  (1)  is  in  fact 
consistent  for  the  causal  parameter,  p.  Note,  however,  that  the  way  we  have  developed  the  problem  of  causal 
inference,  E[Si€i]=0  is  an  assumption  about  e-t  and  S„  whereas  ErXjeJsO  is  a  statement  about  covariates  that 
is  true  by  definition.  This  suggests  that  it  is  important  to  distinguish  error  terms  that  represent  the  random  parts 
of  models  for  potential  outcomes  from  mechanical  decompositions  where  the  relationship  between  errors  and 
regressors  has  no  behavioral  content. 

A  key  question  in  any  regression  study  is  whether  the  selection-on-observables  assumption  is  plausible. 
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The  assumption  clearly  makes  sense  when  there  is  actual  random  assignment  conditional  on  X;.  Even  without 
random  assignment,  however,  selection-on-observables  might  make  sense  if  we  know  a  lot  about  the  process 
generating  the  regressor  of  interest.  We  might  know,  for  example,  that  applicants  to  a  particular  college  or 
university  are  screened  using  certain  characteristics,  but  conditional  on  these  characteristics  all  applicants  are 
acceptable  and  chosen  on  a  first-come/first-serve  basis.  This  leads  to  a  situation  like  the  one  described  by 
Bamow,  Cain,  and  Goldberger  (1980,  p.  47),  where  "Unbiasedness  is  attainable  when  the  variables  that 
determined  the  assignment  are  known,  quantified,  and  included  in  the  equation."  Similarly,  Angrist  (1998) 
argued  that  because  the  military  is  known  to  screen  applicants  on  the  basis  of  observed  characteristics, 
comparisons  of  veteran  and  nonveteran  applicants  that  adjust  for  these  characteristics  have  a  causal 
interpretation.  The  case  for  selection-on-observables  in  a  generic  schooling  equation  is  less  clear  cut,  which 
is  why  so  much  attention  has  focused  on  the  question  of  omitted-variables  bias  in  OLS  estimates  of  schooling 
coefficients. 

Regression  pitfalls 

Schooling  is  not  randomly  assigned  and,  as  in  many  other  problems,  we  do  not  have  detailed 
institutional  knowledge  about  the  process  that  actually  determines  assignment.  The  choice  of  covariates  is 
therefore  crucial.  Obvious  candidates  include  any  variables  that  are  correlated  with  both  schooling  and 
earnings.  Test  scores  are  good  candidates  because  many  educational  institutions  use  tests  to  determine 
admissions  and  financial  aid.  On  the  other  hand,  it  is  doubtful  that  any  particular  test  score  is  a  perfect  control 
for  all  the  differences  in  earnings  potential  between  more  and  less  educated  individuals.  We  see  this  in  the  fact 
that  adding  family  background  variables  like  parental  income  further  reduces  the  size  of  schooling  coefficients. 
A  natural  question  about  any  regression  control  strategy  is  whether  the  estimates  are  highly  sensitive  to  the 
inclusion  of  additional  control  variables.  While  one  should  always  be  wary  of  drawing  causal  inferences  from 
a  regression  with  observational  data,  sensitivity  of  the  regression  results  to  changes  in  the  set  of  control 
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variables  is  an  extra  reason  to  wonder  whether  there  might  be  unobserved  covariates  that  would  change  the 
estimates  even  further. 

The  previous  discussion  suggests  that  Table  3  can  be  interpreted  as  showing  that  there  is  significant 
ability  bias  in  OLS  estimates  of  the  causal  effect  of  schooling  on  earnings.  On  the  other  hand,  a  number  of 
concerns  less  obvious  than  omitted-variables  bias  suggest  this  conclusion  may  be  premature.  A  theme  of  the 
Griliches  and  Chamberlain  papers  cited  in  the  table  is  that  the  negative  impact  of  ability  measures  on  schooling 
coefficients  is  eliminated  and  even  reversed  once  one  accounts  for  two  factors:  measurement  error  in  the 
regressor  of  interest,  and  the  use  of  endogenous  test  score  controls  that  are  themselves  affected  by  schooling. 

A  standard  result  in  the  analysis  of  measurement  error  is  that  if  variables  are  measured  with  an  additive 
error  that  is  uncorrelated  with  correctly-measured  values,  this  imparts  an  attenuation  bias  that  shrinks  OLS 
estimates  towards  zero  (see,  e.g.,  Griliches,  1986,  Fuller,  1987,  and  Section  4,  below).  The  proportionate 
reduction  is  one  minus  the  ratio  of  the  variance  of  correctly-measured  values  to  the  variance  of  measured 
values.  Furthermore,  the  inclusion  of  control  variables  that  are  correlated  with  actual  values  and  uncorrelated 
with  the  measurement  error  tends  to  aggravate  this  attenuation  bias.  The  intuition  for  this  result  is  that  the 
residual  variance  of  true  values  is  reduced  by  the  inclusion  of  additional  control  variables  while  the  residual 
variance  of  the  measurement  error  is  left  unchanged.  Although  studies  of  measurement  error  in  education  data 
suggest  that  only  10  percent  of  the  variance  in  measured  education  is  attributable  to  measurement  error,  it  turns 
out  that  the  downward  bias  in  regression  models  with  ability  and  other  controls  can  still  be  substantial.8 

A  second  complication  raised  in  the  early  literature  on  regression  estimates  of  the  returns  to  schooling 
is  that  variables  used  to  control  for  ability  may  be  endogenous  (see,  e.g.,  Griliches  and  Mason,  1972,  or 
Chamberlain,  1977).  If  wages  and  test  scores  are  both  outcomes  that  are  affected  by  schooling,  then  test 
scores  cannot  play  the  role  of  an  exogenous,  pre-determined  control  variable  in  a  wage  equation.  To  see  this, 


8For  a  detailed  elaboration  of  this  point,  see  Welch,  1975,  or  Griliches,  1977,  who  notes  (p.  13):  "Clearly,  the  more 
variables  we  put  into  the  equation  which  are  related  to  the  systematic  components  of  schooling,  and  the  better  we 
'protect'  ourselves  against  various  possible  biases,  the  worse  we  make  the  errors  of  measurement  problem."  We 
present  some  new  evidence  on  attenuation  and  covariates  in  Section  4,  below. 
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consider  a  simple  example  where  the  causal  relationship  of  interest  is  (4),  and  C(SC,  r|,)=0  so  that  a  bivariate 
regression  would  in  fact  generate  a  consistent  estimate  of  the  causal  effect.  Suppose  that  schooling  affects  test 
scores  as  well  as  earnings,  and  that  the  effect  on  test  scores  can  be  expressed  using  the  model 

(8)  A,  =  Yo  +  Y.Si  +  T|,i- 

This  relationship  can  be  interpreted  as  reflecting  the  fact  that  more  formal  schooling  tends  to  improve  test 
scores  (so  Yi>0).  We  also  assume  that  CCS;,  r|,i)=0,  so  that  OLS  estimates  of  (8)  would  be  consistent  for  y,. 
The  question  is  what  happens  if  we  add  the  outcome  variable,  A,,  to  the  schooling  equation  in  a  mistaken  (in 
this  case)  attempt  to  control  for  ability  bias. 

Endogeneity  of  A,  in  this  context  means  that  r\t  and  \]h  are  correlated.  Since  people  who  do  well  on 
standardized  tests  probably  earn  more  for  reasons  other  than  the  fact  that  they  have  more  schooling,  it  seems 
reasonable  to  assume  that  C(r);,  %)>().  In  this  case,  the  coefficient  on  S;  in  a  regression  of  Yj  on  Ss  and  Aj 
leads  to  an  inconsistent  estimate  of  the  effect  of  schooling.  Evaluation  of  probability  limits  shows  that  the  OLS 
estimate  of  the  schooling  coefficient  in  a  model  that  includes  A,  converges  to 

(9)  C(Yi,S.Ai)A^(S.Aj)  =  p-YI(j)01, 

where  S M  is  the  residual  from  a  regression  of  Sj  on  A(  and  4>0,  is  the  coefficient  from  a  regression  of  ri;  on  r)H 
(see  the  Appendix  for  details).  Since  Yi>0  and  4>oi>0.  controlling  for  the  endogenous  test  score  variable  tends 
to  make  the  estimate  of  the  returns  to  schooling  smaller,  but  this  is  not  because  of  any  omitted- variables  bias 
in  the  equation  of  interest.  Rather  it  is  a  consequence  of  the  bias  induced  by  conditioning  on  an  outcome 
variable.9 

The  problems  of  measurement  error  and  endogenous  regressors  generate  identification  challenges  that 
lead  researchers  to  use  methods  beyond  the  simple  regression-control  framework.  The  most  commonly 
employed  strategies  for  dealing  with  these  problems  involve  instrumental  variables  (TV),  two-stage  least 


9A  similar  problem  may  affect  estimates  of  schooling  coefficients  in  equations  that  control  for  occupation.  Like  test 
scores  and  other  ability  measures,  occupation  is  itself  a  consequence  of  schooling  that  is  probably  correlated  with 
unobserved  earnings  potential. 
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squares  (2SLS),  and  latent-variable  models.  We  briefly  mention  some  2SLS  and  latent-variable  estimates,  but 
defer  a  detailed  discussion  of  2SLS  and  related  IV  strategies  until  Section  2.2.3.  The  major  practical  problem 
in  models  of  this  type  is  to  find  valid  instruments  for  schooling  and  ability.  Panel  B  reports  Griliches  (1977) 
2SLS  estimates  of  equation  (1)  treating  both  schooling  and  IQ  scores  as  endogenous.  The  instruments  are 
family  background  measures  and  a  second  ability  proxy.  Chamberlain  (1978)  develops  an  alternate  approach 
that  uses  panel  data  to  identify  the  effects  of  endogenous  schooling  in  a  latent-variable  model  for  unobserved 
ability.  Both  the  Chamberlain  (1978)  and  Griliches  (1977)  estimates  are  considerably  larger  than  the 
corresponding  OLS  estimates,  a  finding  which  led  these  authors  to  conclude  that  the  empirical  case  for  a 
negative  ability  bias  in  schooling  coefficients  is  much  weaker  than  the  OLS  estimates  suggest.10 

2.2.2  Fixed  effects  and  differences-in-dijferences 

The  main  idea  behind  fixed-effects  identification  strategies  is  to  use  repeated  observations  on 
individuals  (or  families)  to  control  for  unobserved  and  unchanging  characteristics  that  are  related  to  both 
outcomes  and  causing  variables.  A  classic  field  of  application  for  fixed-effects  models  is  the  attempt  to 
estimate  the  effect  of  union  status.  Suppose,  for  example,  that  we  would  like  to  know  the  effect  of  workers' 
union  status  on  their  wages.  That  is,  for  each  worker,  we  imagine  that  there  are  two  potential  outcomes,  Y^ 
denoting  what  the  worker  would  earn  if  not  a  union  member,  and  Y,j  denoting  what  the  worker  would  earn 
as  a  union  member.  This  is  just  like  Y5 ,  in  the  schooling  example,  except  that  here  "S  "  is  the  dichotomous 
variable,  union  status.  The  effect  of  union  status  on  an  individual  worker  is  Y^-Y^  but  this  is  never  observed 
directly  since  only  one  potential  outcome  is  ever  observed  for  each  individual  at  any  one  time." 

Most  analyses  of  the  union  problem  begin  with  a  constant-coefficients  regression  model  for  potential 


'"Another  strand  of  the  literature  on  causal  effects  of  schooling  uses  sibling  data  to  control  for  family  effects  that  are 
shared  by  siblings  (early  studies  are  by  Gorseline,  1932  and  Taubman,  1976;  see  also  Griliches's  (1979)  survey). 
Here  the  problem  of  measurement  error  is  paramount  (see  Section  2.2.2  and  4.1). 

"This  notation  for  counterfactual  outcomes  was  used  by  Rubin  (1974,  1977).  Siegfried  and  Sweeney  (1980)  and 
Chamberlain  (1980)  use  a  similar  notation  to  discuss  the  effect  of  a  classroom  intervention  on  test  scores. 
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outcomes,  where 
(10)  Y0i  =  Xi'P  +  6i, 


Y.^Ya  +  5. 


As  in  the  schooling  problem,  Yq,  has  been  decomposed  into  a  linear  function  of  observed  covariates,  X/P,  and 
a  residual,  e^  that  is  unconelated  with  X;  by  construction.  Using  U(  to  indicate  union  members,  this  leads  to 
the  regression  equation, 

(11)  Y.-X/p  +  U.fi  +  e,, 

which  describes  the  causal  relationship  of  interest. 

Many  researchers  working  in  this  framework  have  argued  that  union  status  is  likely  to  be  related  to 
potential  nonunion  wages,  Ya,  even  after  conditioning  on  covariates,  X(  (see,  e.g„  Abowd  and  Farber,  1982; 
or  chapters  4  and  5  in  Lewis,  1986).  This  means  that  Uj  is  correlated  with  e{,  so  OLS  does  not  estimate  the 
causal  effect,  6.  An  alternative  to  OLS  uses  panel  data  sets  such  as  matched  CPS  rotation  groups,  the  Panel 
Study  of  Income  Dynamics,  or  the  National  Longitudinal  Surveys  and  exploits  repeated  observations  on 
individuals  to  control  for  unobserved  individual  characteristics  that  are  time-invariant.  A  well-known  study 
in  this  genre  is  Freeman  (1984). 

The  following  model,  similar  to  many  in  the  literature  on  union  status,  illustrates  the  fixed-effects 
approach.  Modifying  the  previous  notation  to  incorporate  t=l, . .  .,T  observations  on  individuals,  the  fixed- 
effects  solution  for  this  problem  begins  by  writing 

(12)  Yo|l  =  Xit'PI  +  Xai  +  $iI 

where  a,  is  an  unobserved  variable  for  person  i,  that  we  could,  in  principle,  include  as  a  control  if  it  were 
observed.  Equation  (12)  is  a  regression  decomposition  with  covariates  Xit  and  ait  so  Jju  is  uncorrected  with 
Xit  and  Oj  by  construction  (X,  can  include  characteristics  from  different  periods).  The  causal/regression  model 
for  panel  data  is  now 

(13)  Yit=Xil'Pl  +  UiA  +  ^i  +  5i.. 
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where  we  have  allowed  the  causal  effect  of  interest  to  be  time-varying.  The  identifying  assumptions  are  that 
the  coefficient  X  does  not  vary  across  periods  and  that 

(14)  E[Uity=Ofors=l,...,T 

In  other  words,  whatever  the  source  of  correlation  is  between  Uj,  and  unobserved  earnings  potential,  it  can  be 
described  by  an  additive  time-invariant  covariate  ait  that  has  the  same  coefficient  each  period.  Since 
differencing  eliminates  Xait  OLS  estimates  of  the  differenced  equation 

(15)  Yit  -  Yit.k  =  Xit'P,  -  Xit.k'P,.k  +  Uit6,  -  U,,Ak  +  (5„  -  5m) 
are  consistent  for  the  parameters  of  interest. 

Any  transformation  of  the  data  that  eliminates  the  unobserved  a{  can  be  used  to  estimate  the 
parameters  of  interest  in  this  model.  One  of  the  most  popular  estimators  in  this  case  is  the  deviations-from- 
means  or  the  Analysis  of  Covariance  (ANCOVA)  estimator,  which  is  most  often  used  for  models  where  P,  and 
8,  are  assumed  to  be  fixed.  The  analysis  of  covariance  estimator  is  OLS  applied  to 

(16)  Yjt  -  y,  =  PXXirXi)  +  6(UU-Ui)  +  (5a- 1 , ) 

where  overbars  denote  person-averages.  Analysis  of  covariance  is  preferable  to  differencing  on  efficiency 
grounds  in  some  cases;  for  models  with  normally  distributed  homoscedastic  errors,  ANCOVA  is  the  maximum 
likelihood  estimator.  An  alternative  econometric  strategy  for  the  estimation  of  models  with  individual  effects 
uses  repeated  observations  on  cohort  averages  instead  of  repeated  data  on  individuals.  For  details  and 
examples  see  Ashenfelter  (1984)  or  Deaton  (1985). 

Finally,  note  that  while  standard  fixed-effects  estimators  can  only  be  used  to  estimate  the  effects  of 
time-varying  regressors,  Hausman  and  Taylor  (1981)  have  developed  a  hybrid  panel/TV  procedure  for  models 
with  time-invariant  regressors  (like  schooling).  It  is  also  worth  noting  that  even  if  the  causing  variable  of 
interest  is  time-invariant,  we  can  use  standard  fixed-effects  estimators  to  estimate  changes  in  the  effect  of  a 
time  invariant  variable.  For  example,  the  estimating  equation  for  a  model  with  fixed  U,  is 

(17)  Yi(  -  Yit.k  =  Xit'P,  -  Xit.k'P,.k  +  UAA*)  +  (5*  -  5m). 
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so  (6,-6,.k)  is  identified.  Angrist  (1995b)  used  this  method  to  estimates  changes  in  schooling  coefficients  in 
the  West  Bank  and  Gaza  Strip  even  though  schooling  is  approximately  time-invariant. 

Fixed-effects  pitfalls 

The  use  of  panel  data  to  eliminate  bias  from  unobserved  individual  effects  raises  a  number  of 
econometric  and  statistical  issues.  Since  this  material  is  covered  in  Chamberlain's  (1984)  chapter  in  The 
Handbook  of  Econometrics,  we  limit  our  discussion  to  an  overview  of  problems  that  have  been  of  particular 
concern  to  labor  economists.  First,  analysis  of  covariance  and  differencing  estimators  are  not  consistent  when 
the  process  determining  Uit  involves  lagged  dependent  variables.  This  issue  comes  up  in  the  analysis  of 
training  programs  because  participants  often  experience  a  pre-program  decline  in  earnings,  a  fact  first  noted 
by  Ashenfelter  (1978).  If  past  earnings  are  observed,  the  simplest  strategy  in  this  case  is  simply  to  control  for 
past  earnings  either  by  including  lagged  earnings  as  a  regressor  or  in  matched  treatment-control  comparisons 
(see,  e.g.,  Dehejia  and  Wahba,  1995;  Heckman,  Ichimura,  and  Todd,  1997).  In  fact,  the  question  of  whether 
trainees  and  a  candidate  comparison  group  have  similar  lagged  outcomes  is  sometimes  seen  as  a  litmus  test 
for  the  legitimacy  of  the  comparison  group  in  the  evaluation  of  training  programs  (see,  e.g.,  Heckman  and 
Hotz,  1989). 

A  problem  arises  in  this  context,  however,  when  the  process  determining  Uit  involves  past  outcomes 
and  an  unobserved  covariate,  av  Ashenfelter  and  Card  (1985)  discuss  an  example  involving  the  effect  of 
training  on  the  Social  Security-taxable  earnings  of  trainees  under  the  Comprehensive  Employment  and 
Training  Act  (CETA).  They  propose  a  model  of  training  status  where  individuals  who  enter  CETA  training 
in  year  t  do  so  because  they  have  low  ak  and  their  earnings  were  unusually  low  in  year  t-1.  Suppose  initially 
we  ignore  the  fact  that  training  status  involves  past  earnings,  and  estimate  an  equation  like  (15).  Ignoring  other 
covariates,  this  amounts  to  comparing  the  earnings  growth  of  trainees  and  controls.  But  whatever  the  true 
program  effect  is,  the  growth  in  the  earnings  of  CETA  trainees  from  year  x-1  to  year  t+1  will  tend  to  be  larger 
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than  the  earnings  growth  in  a  candidate  control  group  simply  because  of  regression-to-the-mean.    This 
generates  a  spurious  positive  training  effect  and  the  conventional  differencing  method  breaks  down.12 

A  natural  strategy  for  dealing  with  this  problem  might  seem  to  be  to  add  Yit.,  to  the  list  of  control 
variables,  and  then  difference  away  the  fixed  effect  in  a  model  with  Yjt.,  as  regressor.  The  problem  is  that  now 
any  transformation  that  eliminates  the  fixed  effect  will  leave  at  least  one  regressor  -  the  lagged  dependent 
variable  -  correlated  with  the  errors  in  the  transformed  equation.  Although  the  lagged  dependent  variable  is 
not  the  regressor  of  interest,  the  fact  that  it  is  correlated  with  the  error  term  in  the  transformed  equation  means 
that  the  estimate  of  the  coefficient  on  UiI+,  is  biased  as  well.  A  detailed  description  of  this  problem,  and  the 
solutions  that  have  been  proposed  for  it,  raises  technical  issues  beyond  the  scope  of  this  chapter.  A  useful 
reference  is  Nickell,  1981,  especially  pages  1423-1424.  See  also  Card  and  Sullivan's  (1988)  study  of  the  effect 
of  CETA  training  on  the  employment  rates  of  trainees,  which  reports  both  fixed-effects  estimates  and  matching 
estimates  that  control  for  lagged  outcomes. 

A  second  potential  problem  with  fixed-effects  estimators  is  that  bias  from  measurement  error  is  usually 
aggravated  by  transformations  that  eliminate  the  individual  effects  (see,  e.g.,  Freeman,  1984;  Griliches  and 
Hausman,  1986).  This  fact  provides  an  alternative  explanation  for  why  fixed-effects  estimates  often  turn  out 
to  be  smaller  than  estimates  in  levels.  Finally,  perhaps  the  most  important  problem  with  this  approach  is  that 
the  assumption  that  omitted  variables  can  be  captured  by  an  additive,  time-invariant  individual  effect  is 
arbitrary  in  the  sense  that  it  usually  does  not  come  from  economic  theory  or  from  information  about  the 
relevant  institutions.'3  On  the  other  hand,  the  fixed-effects  approach  has  a  superficial  plausibility  ("whatever 
makes  us  special  is  timeless")  and  an  identification  payoff  that  is  hard  to  beat.  Also,  fixed-effects  models  lend 
themselves  to  a  variety  of  specification  tests.  See,  for  example,  Ashenfelter  and  Card  (1985),  Chamberlain 
(1984),  Griliches  and  Hausman  (1986),  Angrist  and  Newey  (1991),  and  Jakubson  (1991).  Many  of  these 


12Deviations-from-means  estimators  are  also  biased  in  this  case. 

l3An  exception  is  the  literature  on  life-cycle  labor  supply  (e.g.,  MaCurdy,  1981;  Altonji,  1986). 


21 
studies  also  focus  on  the  union  example. 

The  Differences-in-Differences  (DD)  model 

Differences-in-differences  strategies  are  simple  panel-data  methods  applied  to  sets  of  group  means  in 
cases  when  certain  groups  are  exposed  to  the  causing  variable  of  interest  and  others  are  not.  This  approach, 
which  is  transparent  and  often  at  least  superficially  plausible,  is  well-suited  to  estimating  the  effect  of  sharp 
changes  in  the  economic  environment  or  changes  in  government  policy.  The  DD  method  has  been  used  in 
hundreds  of  studies  in  economics,  especially  in  the  last  two  decades,  but  the  basic  idea  has  a  long  history.  An 
early  example  in  labor  economics  is  Lester  (1946),  who  used  the  differences-in-differences  technique  to  study 
employment  effects  of  minimum  wages.14 

The  DD  approach  is  explained  here  using  Card's  (1990)  study  of  the  effect  of  immigration  on  the 
employment  of  natives  as  an  example.  Some  observers  have  argued  that  immigration  is  undesirable  because 
low-skilled  immigrants  may  displace  low-skilled  or  less-educated  US  citizens  in  the  labor  market.  Anecdotal 
evidence  for  this  claim  includes  newspaper  accounts  of  hostility  between  immigrants  and  natives  in  some 
cities,  but  the  empirical  evidence  is  inconclusive.  See  Friedberg  and  Hunt  (1995)  for  a  survey  of  research  on 
this  question.  As  in  our  earlier  examples,  the  object  of  research  on  immigration  is  to  find  some  sort  of 
comparison  that  provides  a  compelling  answer  to  'what  if  questions  about  the  consequences  of  immigration. 

Card's  study  used  a  sudden  large-scale  migration  from  Cuba  to  Miami  known  as  the  Mariel  Boatlift 
to  make  comparisons  and  answer  counterfactual  questions  about  the  consequences  of  immigration.  In 
particular,  Card  asks  whether  the  Mariel  immigration,  which  increased  the  Miami  labor  force  by  about  7 
percent  between  May  and  September  of  1980,  reduced  the  employment  or  wages  of  non-immigrant  groups. 
An  important  component  of  this  identification  strategy  is  the  selection  of  comparison  cities  that  can  be  used 


"The  DD  method  goes  by  different  names  in  different  fields.  Psychologist  Campbell  (1969)  calls  it  the  "non- 
equivalent  control-group  pretest-posttest  design." 
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to  estimate  what  would  have  happened  in  the  Miami  labor  market  absent  the  Mariel  immigration. 

The  comparison  cities  Card  used  in  the  Mariel  Boatlift  study  were  Atlanta,  Los  Angeles,  Houston,  and 
Tampa-St.  Petersburg.  These  cities  were  chosen  because,  like  Miami,  they  have  large  Black  and  Hispanic 
populations  and  because  discussions  of  the  impact  of  immigrants  often  focuses  on  the  consequences  for 
minorities.  Most  importantly,  these  cities  appear  to  have  employment  trends  similar  to  those  in  Miami  at  least 
since  1976.  This  is  documented  in  Figure  1,  which  is  similar  to  a  figure  in  Card's  (1989)  working  paper  that 
did  not  appear  in  the  published  version  of  his  study.  The  figure  plots  monthly  observations  on  the  log  of 
employment  in  Miami  and  the  four  comparison  cities  from  1970  through  1998.  The  two  series,  which  are  from 
BLS  establishment  data,  have  been  normalized  by  subtracting  the  1970  value. 

Table  4  illustrates  DD  estimation  of  the  effect  of  Boatlift  immigrants  on  unemployment  rates, 
separately  for  whites  and  blacks.  The  first  column  reports  unemployment  rates  in  1979,  the  second  column 
reports  unemployment  rates  in  1981,  and  the  third  column  reports  the  1981-1979  difference.  The  rows  give 
numbers  for  Miami,  the  comparison  cities,  and  the  difference  between  them.  For  example,  between  1981  and 
1979,  the  unemployment  rate  for  Blacks  in  Miami  rose  by  about  1.3  percent,  though  this  change  is  not 
significant.  Unemployment  rates  in  the  comparisons  cities  rose  even  more,  by  2.3  percent.  The  difference  in 
these  two  changes,  -1.0  percent,  is  a  DD  estimate  of  the  effect  of  the  Mariel  immigrants  on  the  unemployment 
rate  of  Blacks  in  Miami.  In  this  case,  the  estimated  effect  on  the  unemployment  rate  is  actually  negative, 
though  not  significantly  different  from  zero. 

The  rationale  for  this  double-differencing  strategy  can  be  explained  in  terms  of  restrictions  on  the 
conditional  mean  function  for  potential  outcomes  in  the  absence  of  immigration.  As  in  the  union  example,  let 
Yqj  be  i's  employment  status  in  the  absence  of  immigration  and  let  Yn  be  i's  employment  status  if  the  Mariel 
immigrants  come  to  i's  city.  The  unemployment  rate  in  city  c  in  year  t  is  ErYJ  c,  t],  with  no  immigration 
wave,  and  E[YJ  c,  t]  if  there  is  an  immigration  wave.  In  practice,  we  know  that  the  Mariel  immigration 
happened  in  Miami  in  1980,  so  that  the  only  values  of  E[Y,jl  c,  t]  we  get  to  see  are  for  c=Miami  and  r>1980. 
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The  Mariel  Boatlift  study  uses  the  comparison  cities  to  estimate  the  counterfactual  average,  EfYJ  c=Miami, 
r>1980],  i.e.,  what  the  unemployment  rate  in  Miami  would  have  been  if  the  Mariel  immigrants  had  not  come. 
The  DD  method  identifies  causal  effects  by  restricting  the  conditional  mean  function  E[YJ  c,  t]  in 
a  particular  way.  Specifically,  suppose  that 

(18)  E[YUI  c,  r]  =  P,  +  Yc. 

that  is,  in  the  absence  of  immigration,  unemployment  rates  can  be  written  as  the  sum  of  a  year  effect  that  is 
common  to  cities  and  a  city  effect  that  is  fixed  over  time.  The  additive  model  pertains  to  E[YJ  c,  t]  instead 
of  Ya  directly  because  the  latter  is  a  zero/one  variable.  Suppose  also  that  the  effect  of  the  Mariel  immigration 
is  simply  to  add  a  constant  to  E[YJ  c,  t],  so  that 

(19)  E[Y„lc,r]  =  E[Y0,lc,r]  +  6 

This  means  the  employment  status  of  individuals  living  in  Miami  and  the  comparison  cities  in  1979  and  1981 
can  be  written  as 

(20)  Y;=  p.  +  Yc  +  SM.  +  ej 

where  E[esl  c,  t]  =  0  and  M,  is  a  dummy  variable  that  equals  1  if  i  was  exposed  to  the  Mariel  immigration  by 
living  in  Miami  after  1980.  Differencing  unemployment  rates  across  cities  and  years  gives 

(21)  {E[Yilc=Miami,/=1981]-E[Y,lc=Comparison,r=1981]}- 

{E[YI  c=Miami,  r=1979]  -  E[Y;I  c=Comparison,  t=1979]}  =  6. 
Note  that  Ms  in  equation  (20)  is  an  interaction  term  equal  to  the  product  of  a  dummy  indicating 
observations  after  1980  and  a  dummy  indicating  residence  in  Miami.  The  DD  estimate  can  therefore  also  be 
computed  in  a  regression  of  stacked  micro  data  for  cities  and  years.  The  regressors  consist  of  dummies  for 
years,  dummies  for  cities,  and  Mj.  Similarly,  a  regression-adjusted  version  of  the  DD  estimator  adds  a  vector 
of  individual  characteristics,  X{  to  equation  (20): 
Yi  =  Xi'P0  +  PI  +  Yc  +  6Mi  +  ei, 
where  po  is  now  a  vector  of  coefficients  that  includes  a  constant.  Controlling  for  X,  changes  the  estimate  of 
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8  only  if  M,  are  X,  are  correlated,  conditional  on  city  and  year  main-effects. 

DD  Pitfalls 

Like  any  other  identification  strategy,  DD  is  not  guaranteed  to  identify  the  causal  effect  of  interest. 
Meyer  (1995)  and  Campbell  (1969)  outline  a  range  of  threats  to  the  causal  interpretation  of  DD  estimates.  The 
key  identifying  assumption  is  clearly  that  interaction  terms  are  zero  in  the  absence  of  the  intervention.  In  fact, 
it  is  easy  to  imagine  that  unemployment  rates  evolve  differently  across  cities  regardless  of  shocks  like  the 
Mariel  immigration.  One  way  to  test  this  is  to  compare  trends  in  outcomes  before  or  after  the  event  of  interest. 
As  noted  above,  the  comparison  cities  in  this  case  were  chosen  partly  on  the  basis  of  Figure  1,  which  shows 
that  the  comparison  cities  exhibited  a  pattern  of  economic  growth  similar  to  that  in  Miami.  Identification  of 
causal  effects  using  city/year  comparisons  clearly  turns  on  the  assumption  that  the  two  sets  of  cities  would  have 
had  the  same  employment  trends  had  the  boatlift  not  occurred.  We  introduce  some  new  evidence  on  this 
question  in  Section  2.4. 
2.2.3.  Instrumental  Variables 

Identification  strategies  based  on  instrumental  variables  can  be  thought  of  as  a  scheme  for  using 
exogenous  field  variation  to  approximate  randomized  trials.  Again,  we  illustrate  with  an  example  where  there 
is  an  underlying  causal  relationship  of  interest,  in  this  case  the  effect  of  Vietnam-era  military  service  on  the 
earnings  of  veterans  later  in  life.  In  the  1 960s  and  early  1970s,  young  men  were  at  risk  of  being  drafted  for 
military  service.  Policy  makers,  veterans  groups,  and  economists  have  long  been  interested  in  what  the 
consequences  of  this  military  service  were  for  the  men  involved.  A  belief  that  military  service  is  a  burden 
helped  to  mobilize  support  for  a  range  of  veterans'  programs  and  for  ending  the  draft  in  1973  (see,  e.g., 
Taussig,  1974).  Concerns  about  fairness  also  led  to  the  institution  of  a  draft  lottery  in  1970  that  was  used  to 
determine  priority  for  conscription  in  cohorts  of  1 9-year-olds.  This  lottery  was  used  by  Hearst,  Newman,  and 
Hulley  (1986)  to  estimate  the  effects  of  military  service  on  civilian  mortality  and  by  Angrist  (1990)  to  construct 
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IV  estimates  of  the  effects  of  military  service  on  civilian  earnings. 

As  in  the  union  problem,  the  causal  relationship  of  interest  is  based  on  the  notion  that  there  are  two 
potential  outcomes,  Yft,  denoting  what  someone  from  the  Vietnam-era  cohort  would  earn  if  they  did  not  serve 
in  the  military  and  Y,j,  denoting  earnings  as  a  veteran.  Again,  using  a  constant-effects  model  for  potential 
outcomes,  we  can  write 

(22)  Y0l  =  p0  +  r|j 
V„  =  Ya  +  8, 

where  Po^EfYo,].  The  constant  effect  6  is  the  parameter  of  interest.  IV  estimates  can  be  interpreted  under 
weaker  assumptions  than  this,  but  we  postpone  a  discussion  of  this  point  until  Section  2.3.  As  in  the  union 
and  schooling  problems,  r^  is  the  random  part  of  potential  outcomes,  but  at  this  point  there  are  no  observed 
covariates  in  the  model  for  Y^.  Using  D(  to  indicate  veteran  status,  causal  relationship  of  interest  can  be 
written 

(23)  Y-Po  +  D^  +  ri, 

Also  as  in  the  union  and  schooling  problems,  there  is  a  concern  that  since  D>  is  not  randomly  assigned,  a 
comparison  of  all  veterans  to  all  nonveterans  would  not  identify  the  causal  effect  of  interest.  Suppose,  for 
example,  that  individuals  with  low  civilian  earnings  potential  are  more  likely  to  serve  in  the  military,  either 
because  they  want  to  or  because  they  are  less  adept  at  obtaining  deferments.  Then  the  regression  coefficient 
in  (23),  which  is  also  the  difference  in  means  by  veteran  status,  is  biased  downwards: 

(24)  EtYil  Dpl]-E[Y,I  D~0]  =  6+  {E^l  D^-E^l  Di=0}]  <  8. 

IV  methods  can  eliminate  this  sort  of  bias  if  the  researcher  has  access  to  an  instrumental  variable  Zj, 
that  is  correlated  with  Dj,  but  otherwise  independent  of  potential  outcomes.  A  natural  instrument  is  draft- 
eligibility  status,  since  this  was  determined  by  a  lottery  over  birthdays.  In  particular,  in  each  year  from  1970 
to  1972,  random  sequence  numbers  (RSNs)  were  randomly  assigned  to  each  birth  date  in  cohorts  of  19-year- 
olds.  Men  with  lottery  numbers  below  an  eligibility  ceiling  were  eligible  for  the  draft,  while  men  with  numbers 
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above  the  ceiling  could  not  be  drafted.  In  practice,  many  draft-eligible  men  were  still  exempted  from  service 
for  health  or  other  reasons,  while  many  men  who  were  draft-exempt  nevertheless  volunteered  for  service.  So 
veteran  status  was  not  completely  determined  by  randomized  draft-eligibility;  eligibility  and  veteran  status  are 

merely  correlated. 

For  white  men  who  were  at  risk  of  being  drafted  in  the  1970-71  draft  lotteries,  draft-eligibility  is 
clearly  associated  with  lower  earnings  in  years  after  the  lottery.  This  can  be  seen  in  Table  5,  which  reports  the 
effect  of  randomized  draft-eligibility  status  on  Social  Security  earnings  in  column  (3).  Column  (1)  shows 
average  annual  earnings  for  purposes  of  comparison.  These  data  are  the  FICA-taxable  earnings  of  men  with 
earnings  covered  by  OASDI;  for  details  see  the  appendix  to  Angrist  (1990).  For  men  born  in  1950,  there  are 
significant  negative  effects  of  eligibility  status  on  earnings  in  1970,  when  these  men  were  being  drafted,  and 
in  1981,  ten  years  later.  In  contrast,  there  is  no  evidence  of  an  association  between  eligibility  status  and 
earnings  in  1969,  the  year  the  lottery  drawing  for  men  bom  in  1950  was  held  but  before  anyone  bom  in  1950 
was  actually  drafted.  Similarly,  for  men  born  in  1951,  there  are  large  negative  eligibility  effects  in  1971  and 
1981,  but  no  evidence  of  an  effect  in  1970,  before  anyone  bom  in  1951  was  actually  drafted.  The  timing  of 
these  effects  suggests  that  the  negative  association  between  draft-eligibility  status  and  earnings  is  caused  by 
the  military  service  of  draft-eligible  men. 

Because  eligibility  status  was  randomly  assigned,  the  claim  that  the  estimates  in  column  (3)  represent 
the  effect  of  draft-eligibility  on  earnings  seems  uncontroversial.  How  do  we  go  from  the  effect  of  draft- 
eligibility  to  the  effect  of  veteran  status?  The  identifying  assumption  in  this  case  is  that  Zj  is  independent  of 
potential  earnings,  which  in  this  case  means  that  Z,  is  uncorrected  with  r)j.  It  follows  immediately  that  6  = 
C(Y;,  Z,)/C(D„  Zj).  The  intuition  here  is  that  only  part  of  the  variation  in  Ds  -  the  part  that  is  associated  with 
Z;  -  is  used  to  identify  the  parameter  of  interest  (8).  Because  Z,  is  a  binary  variable,  we  also  have 
(25)  6  =  {EtYJ  Z-ll-ErYJ  Z^^EtD!  Z=1]-E[DI  Z=0]}  . 

The  sample  analog  of  (25)  is  the  Wald  (1940)  estimator  that  was  originally  applied  to  measurement  error 
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problems.15  Note  that  we  could  have  arrived  at  (25)  directly,  i.e.,  without  reference  to  the  C(Yj,  Zj)/C(Dj,  Z;) 
formula,  because  the  independence  of  Z,  and  potential  outcomes  implies  E[  r|,l  ZJ=0.    In  this  case,  the  Wald 
estimator  is  simply  the  difference  in  mean  earnings  between  draft-eligible  and  ineligible  men,  divided  by  the 
difference  in  the  probability  of  serving  in  the  military  between  draft-eligible  and  ineligible  men. 

The  only  information  required  to  go  from  draft-eligibility  effects  to  veteran-status  effects  is  the 
denominator  of  the  Wald  estimator,  which  is  the  effect  of  draft-eligibility  on  the  probability  of  serving  in  the 
military.  This  information,  which  comes  from  the  Survey  of  Income  and  Program  Participation  (SIPP), 
appears  in  column  (4)  of  Table  5.16  For  earnings  in  1981,  long  after  most  Vietnam-era  servicemen  were 
discharged  from  the  military,  the  Wald  estimates  of  the  effect  of  military  service  amount  to  about  16  percent 
of  earnings.  Effects  for  men  while  in  the  service  are  much  larger,  which  is  not  surprising  since  military  pay 
during  the  conscription  era  was  extremely  low. 

An  important  feature  of  the  Wald/TV  estimator  is  that  the  identifying  assumptions  are  easy  to  assess 
and  interpret.  The  basic  claim  justifying  a  causal  interpretation  of  the  estimator  is  that  the  only  reason  why 
E[Yjl  Z,]  varies  with  Z,  is  because  E[D,I  ZJ  varies  with  Z,.  A  simple  way  to  check  is  to  look  for  an  association 
between  Zj  and  personal  characteristics  that  should  not  be  affected  by  D„  such  as  age,  race,  sex,  or  any  other 
characteristic  that  was  determined  before  D;  was  determined.  Another  useful  check  is  to  look  for  an  association 
between  the  instrument  and  outcomes  in  samples  where  there  is  no  reason  for  such  a  relationship.  If  it  really 
is  true  that  the  only  reason  why  draft-eligibility  affects  earnings  is  veteran  status,  then  in  samples  where 
eligibility  status  is  unrelated  to  veteran  status,  draft-eligibility  effects  on  earnings  should  be  zero.  This  idea 
is  illustrated  in  Table  5,  which  reports  estimates  for  men  bom  in  1953.  Although  there  was  a  lottery  drawing 
which  assigned  RSNs  to  the  1953  cohort  in  February  of  1972,  no  one  born  in  1953  was  actually  drafted  (the 


l3The  relationship  between  IV  with  binary  instruments  and  Wald  estimators  was  first  noted  by  Durbin  (1954). 

"In  this  case,  the  denominator  of  the  Wald  estimates  does  not  come  from  the  same  data  set  as  the  numerator  since 
the  Social  Security  administration  has  no  information  on  veteran  status.  As  long  as  the  information  used  to  estimate 
the  numerator  and  denominator  are  representative  of  the  same  population,  the  resulting  two-sample  estimate  will  be 
consistent.  The  econometrics  behind  this  two-sample  approach  to  IV  are  discussed  briefly  in  Section  3.4,  below. 
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draft  officially  ended  in  July  1973).  This  is  reflected  in  the  first-stage  relationship  between  draft-eligibility 
for  men  bom  in  1953  (defined  using  the  1952  RSN  cutoff  of  95),  which  shows  an  insignificant  difference  in 
the  probability  of  serving  by  eligibility  status.  In  fact,  there  is  no  significant  relationship  between  Yj  and  Zx. 
Evidence  of  a  relationship  between  Z,  and  Yj  would  cast  doubt  on  the  claim  that  the  only  reason  for  draft- 
eligibility  effects  is  the  military  service  of  the  men  who  were  draft-eligible.  We  discuss  other  specification 
checks  of  this  type  in  Section  2.4. 

So  far  the  discussion  of  IV  has  allowed  for  only  three  variables:  the  outcome,  the  endogenous 
regressor,  and  the  instrument.  In  many  cases,  the  assumption  that  EfZ^riJ^  is  more  plausible  after  controlling 
for  a  vector  of  covariates,  Xj.  Decomposing  the  random  part  of  potential  outcomes  in  (22)  into  a  linear 
function  of  k  control  variables  and  an  error  term  so  that  rjs  =  Xj'P  +  e,  as  before,  the  resulting  estimating 
equation  is 

(26)  Yj  =  X/p  +  D,6  +  e. 

Note  that  since  e,  is  defined  as  the  residual  from  a  regression  of  r^  on  Xit  it  is  uncorrected  with  Xj  by 
construction.  In  contrast  with  6,  which  has  a  causal  interpretation.,  the  coefficient  vector  P  is  not  meant  to 
capture  the  causal  effect  of  the  X-variables.  As  in  the  discussion  of  regression,  we  make  a  clear  distinction 
between  control  variables  and  causing  variables. 

Equations  like  (26)  are  typically  estimated  using  2SLS,  i.e.,  by  substituting  the  fitted  values  from  a 
first-stage  regression  of  Dj  on  Xj  and  Z,.  In  some  applications,  more  than  one  instrument  is  available  to 
estimate  the  single  causal  effect,  6.  2SLS  accommodates  this  situation  by  including  all  the  instruments  in  the 
first-stage  equation.  The  combination  of  multiple  instruments  to  produce  a  single  estimate  makes  the  most 
sense  in  a  constant-coefficients  framework.  The  assumption  of  instrument  validity  and  constant  coefficients 
can  also  be  tested  in  this  case  (see,  e.g.,  Hansen,  1982;  Newey,  1985).  In  a  more  general  setting  with 
heterogeneous  potential  outcomes,  different  instruments  estimate  different  weighted  averages  of  the  difference 
Y^-Yoj  (Imbens  and  Angrist,  1994).  We  return  to  this  point  in  Section  2.3. 
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IV  Pitfalls 

The  most  important  IV  pitfall  is  the  validity  of  instruments,  i.e.,  the  possibility  that  rj,  and  Z,  are 
correlated.  Suppose,  for  example,  that  Z,  is  related  to  the  vector  of  control  variables,  Xit  and  we  do  not  account 
for  this  in  the  estimation.  The  Wald/IV  estimator  in  that  case  has  probability  limit 

5  +  p;{EKI^=l]-E[X,IZrO]}/{E[DlIZr?l]-E[P!l^=Qll, 
This  is  a  version  of  the  omitted-variables  bias  formula  for  IV.  The  formula  captures  the  fact  that  "a  little 
omitted  variables  bias  can  go  a  long  way"  in  an  IV  setting,  because  the  association  between  X;  and  Z;  gets 
multiplied  by  {E[DI  Z=1]-E[DI  Z=0] }"'.  In  the  draft  lottery  case,  for  example,  any  draft-eligibility  effects  on 
omitted  variables  get  multiplied  by  about  1/.15=6.7. 

A  second  important  point  about  bias  in  instrumental  variables  estimates  is  that  random  assignment 
alone  does  not  guarantee  a  valid  instrument.  Suppose,  for  example,  that  in  addition  to  being  more  likely  to 
serve  in  the  military,  men  with  low  draft-lottery  numbers  were  more  likely  to  stay  in  college  so  as  to  extend 
a  draft  deferment.  This  fact  will  create  a  relationship  between  potential  earnings  and  Zj  even  for  nonveterans, 
in  which  case  IV  yields  biased  estimates  of  the  causal  effect  of  veteran  status.  Random  assignment  of  Zj  does 
not  rule  out  this  sort  of  bias  since  draft-eligibility  can  in  principle  have  consequences  in  addition  to  influencing 
the  probability  of  being  a  veteran.  In  other  words,  while  the  randomization  of  Z,  ensures  that  the  reduced-form 
relationship  between  Yj  and  Z,  represents  the  causal  effect  of  draft  eligibility  on  earnings,  it  does  not  guarantee 
that  the  only  reason  for  this  relationship  is  Dj.  The  distinction  between  the  assumed  random  assignment  of  an 
instrument  and  the  assumption  that  a  single  causal  mechanism  explains  effects  on  outcomes  is  discussed  in 
greater  detail  by  Angrist,  Imbens,  and  Rubin  (1996). 

Finally,  the  use  of  2SLS  to  combine  many  different  instruments  can  lead  to  finite-sample  bias.  The 
standard  inference  framework  uses  asymptotic  theory,  i.e.,  inference  is  based  on  approximations  that  are 
increasingly  accurate  as  sample  sizes  grow.  Typically,  inferences  about  OLS  coefficient  estimates  also  use 
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asymptotic  theory  since  the  relevant  finite-sample  theory  assumes  normally  distributed  errors.  A  key  difference 
between  IV  and  OLS  estimators,  however,  is  that  even  without  normality  OLS  provides  an  unbiased  estimate 
of  population  regression  coefficients  (provided  the  regression  function  is  linear;  see,  e.g.,  Goldberger,  1991, 
Chapter  13).  In  contrast,  IV  estimators  are  consistent  but  not  unbiased.  This  means  that  under  repeated 
sampling  with  a  fixed  sample  size,  IV  estimates  may  systematically  deviate  from  the  corresponding  population 
parameter.17  Moreover,  this  bias  tends  to  pull  IV  estimates  towards  the  corresponding  OLS  estimates,  giving 
a  misleading  impression  of  similarity  between  the  two  sets  of  estimates  (see,  e.g.,  Sawa,  1969). 

How  bad  is  the  finite-sample  bias  in  an  IV  estimate  likely  to  be?  In  practice,  this  largely  turns  on  the 
number  of  instruments  relative  to  the  sample  size,  and  the  strength  of  the  first-stage  relationship.  Other  things 
equal,  more  instruments,  smaller  samples,  and  weaker  instruments  each  mean  more  bias  (see,  e.g.,  Buse,  1 992). 
The  fact  that  IV  estimates  can  be  noticeably  biased  even  with  very  large  data  sets  was  highlighted  by  Bound, 
Jaeger,  and  Baker  (1995),  focusing  on  Angrist  and  Krueger's  (1991)  compulsory  schooling  study.  This  study 
uses  hundreds  of  thousands  of  observations  from  Census  data  to  implement  an  instrumental  variables  strategy 
for  estimating  the  returns  to  schooling.  The  instruments  are  quarter-of-birth  dummies  since  children  bom 
earlier  in  the  year  enter  school  at  an  older  age  and  are  therefore  allowed  to  drop  out  of  school  (typically  on  their 
1 6th  birthday)  after  having  completed  less  schooling.  Some  of  the  2SLS  estimates  in  Angrist  and  Krueger 
(1991)  use  many  quarter-of-birth/state-of-birth  interaction  terms  in  addition  to  quarter-of-birth  main  effects 
as  instruments.  Since  the  underlying  first-stage  relationship  in  these  particular  models  is  not  very  strong,  there 
is  potential  for  substantial  bias  towards  the  OLS  estimates  in  these  specifications. 

Bound,  Jaeger,  and  Baker  (1995)  discuss  the  question  of  how  strong  a  first-stage  relationship  has  to 
be  in  order  to  minimize  the  potential  for  bias.  They  suggest  using  the  F-statistic  for  the  joint  significance  of 
the  excluded  instruments  in  the  first-stage  equation  as  a  diagnostic.  This  is  clearly  sensible,  since,  if  the 


l7A  similar  problem  arises  with  Generalized  Method  of  Moments  estimation  of  models  for  covariance  structures  (see 
Altonji  and  Segal,  1996). 
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instruments  are  so  weak  that  the  relationship  between  instruments  and  endogenous  regressors  cannot  be 
detected  with  a  reasonably  high  level  of  confidence,  then  the  instruments  should  probably  be  abandoned.  On 
the  other  hand,  Hall,  Rudebusch,  and  Wilcox  (1996)  point  out  that  this  sort  of  selection  procedure  also  has 
the  potential  to  induce  a  bias  from  pre-testing  that  can  in  some  cases  aggravate  the  bias  instead  of  reducing  it. 

A  simple  alternative  (or  complement)  to  screening  on  the  first-stage  F  is  to  use  estimators  that  are 
approximately  unbiased.  One  such  estimator  is  Limited  Information  Likelihood  (LIML),  which  has  no  integral 
moments  but  is  nevertheless  median-unbiased.  This  means  that  the  sampling  distribution  is  centered  at  the 
population  parameter.18  In  fact,  any  just-identified  2SLS  estimator  is  also  median-unbiased  since  2SLS  and 
LIML  are  identical  for  just-identified  models.  The  class  of  median-unbiased  instrumental  variables  estimators 
therefore  includes  the  Wald  estimator  discussed  in  the  previous  section.  Other  approximately  unbiased 
estimators  are  based  on  procedures  that  estimate  the  first-stage  and  second-stage  relationship  in  separate  data 
sets.  This  includes  Two-Sample  and  Split-Sample  IV  (Angrist  and  Krueger,  1992,  1995),  and  an  IV  estimator 
that  uses  a  set  of  leave-one-out  first-stage  estimates  called  Jackknife  Instrumental  Variables  (Angrist,  Imbens, 
and  Krueger,  1998).19  An  earlier  literature  discussed  combination  estimators  that  are  approximately  unbiased 
(see,  e.g.,  Sawa,  1973).  Recently,  Chamberlain  and  Imbens  (1996)  introduced  a  Bayesian  IV  estimator  that 
also  avoids  bias. 

A  final  and  related  point  is  that  the  reduced  form  OLS  regression  of  the  dependent  variable  on 
exogenous  covariates  and  instruments  is  unbiased  in  a  sample  of  any  size,  regardless  of  the  power  of  the 
instrument  (assuming  the  reduced  form  is  linear).  This  is  important  because  the  reduced  form  effects  of  the 


''Anderson,  Kunitomo,  and  Sawa  (1982,  p.  1026)  report  this  in  a  Monte  Carlo  study:  'To  surrxnarize,  the  most 
important  conclusion  from  the  study  of  LIML  and  2SLS  estimators  is  that  the  2SLS  estimator  can  be  badly  biased 
and  in  that  sense  its  use  is  risky.  The  LIML  estimator,  on  the  other  hand,  has  a  little  more  variability  with  a  slight 
chance  of  extreme  values,  but  its  distribution  is  centered  at  the  parameter  value."  Similar  Monte  Carlo  results  and  a 
variety  of  analytic  justifications  for  the  approximate  unbiasedness  of  LIML  appear  in  Bekker  (1994),  Donald  and 
Newey  (1997),  Staiger  and  Stock  (1997),  and  Angrist,  Imbens,  and  Krueger  (1998). 

"A  SAS  program  that  computes  Split-Sample  and  Jackknife  IV  is  available  at 
http://www.wws.princeton.edu/faculty/krueger.html. 
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instrument  on  the  dependent  variable  are  proportional  to  the  coefficient  on  the  endogenous  regressor  in  the 
equation  of  interest.  The  existence  of  a  causal  relationship  between  the  endogenous  regressor  and  dependent 
variable  can  therefore  be  gauged  through  the  reduced  form  without  fear  of  finite-sample  bias  even  if  the 
instruments  are  weak. 

2.2.4  Regression-discontinuity  designs 

The  Latin  motto  Marshall  placed  on  the  title  page  of  his  Principles  of  Economics  is,  "Natura  nonfacit 
saltum, "  which  means:  "Nature  does  not  make  jumps."  Marshall  argues  that  most  economic  behavior  evolves 
gradually  enough  to  be  modeled  or  explained.  The  notion  that  human  behavior  is  typically  orderly  or  smooth 
is  at  the  heart  of  a  research  strategy  called  the  regression-discontinuity  (RD)  design.  RD  methods  use  some 
sort  of  parametric  or  semi-parametric  model  to  control  for  smooth  or  gradually  evolving  trends,  inferring 
causality  when  the  variable  of  interest  changes  abruptly  for  non-behavioral  or  arbitrary  reasons.  There  are  a 
number  of  ways  to  implement  this  idea  in  practice.  We  focus  here  on  an  approach  that  can  viewed  as  a  hybrid 
regression-control/TV  identification  strategy.  This  is  distinct  from  conventional  IV  strategies  because  the 
instruments  are  derived  explicitly  from  nonlinearities  or  discontinuities  in  the  relationship  between  the 
regressor  of  interest  and  a  control  variable.  Recent  applications  of  the  RD  idea  include  van  der  Klauuw's 
(1996)  study  of  financial  aid  awards;  Angrist  and  Lavy's  (1998)  study  of  class  size;  and  Hahn,  Todd,  and  van 
der  Klaauw's  (1998)  study  of  anti-discrimination  laws. 

The  RD  idea  originated  with  Campbell  (1969),  who  discussed  the  (theoretical)  problem  of  how  to 
identify  the  causal  effect  of  a  treatment  that  is  assigned  as  a  deterministic  function  of  an  observed  covariate 
which  is  also  related  to  the  outcomes  of  interest.  Campbell  used  the  example  of  estimating  the  effect  of 
National  Merit  scholarships  on  applicants'  later  academic  achievement.  He  argued  that  if  there  is  a  threshold 
value  of  past  achievement  that  determines  whether  an  award  is  made,  then  one  can  control  for  any  smooth 
function  of  past  achievement  and  still  estimate  the  effect  of  the  award  at  the  point  of  discontinuity.  This  is 
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done  by  matching  discontinuities  or  nonlinearities  in  the  relationship  between  outcomes  and  past  achievement 
to  discontinuities  or  nonlinearities  in  the  relationship  between  awards  and  past  achievement.20  van  der  Klauuw 
(1996)  pointed  out  the  link  between  Campbell's  suggestion  and  IV,  and  used  this  idea  to  estimate  the  effect 
of  financial  aid  awards  on  college  enrollment.21 

Angrist  and  Lavy  (1998)  used  RD  to  estimate  the  effects  of  class  size  on  pupil  test  scores  in  Israeli 
public  schools,  where  class  size  is  officially  capped  at  40.  They  refer  to  the  cap  of  40  as  "Maimonides'  Rule," 
after  the  12th  Century  Talmudic  scholar  Maimonides,  who  first  proposed  it.  According  to  Maimonides'  Rule, 
class  size  increases  one-for-one  with  enrollment  until  40  pupils  are  enrolled,  but  when  41  students  are  enrolled, 
there  will  be  a  sharp  drop  in  class  size,  to  an  average  of  20.5  pupils.  Similarly,  when  80  pupils  are  enrolled, 
the  average  class  size  will  again  be  40,  but  when  81  pupils  are  enrolled  the  average  class  size  drops  to  27. 
Thus,  Maimonides'  Rule  generates  a  discontinuity  in  the  relationship  between  grade  enrollment  and  average 
class  size  at  integer  multiples  of  40. 

The  class  size  function  derived  from  Maimonides'  Rule  can  be  stated  formally  as  follows.  Let  bi 
denote  beginning-of-the-year  enrollment  in  school  s  in  a  given  grade,  and  let  zs  denote  the  size  assigned  to 
classes  in  school  s,  as  predicted  by  applying  Maimonides'  Rule  to  that  grade.  Assuming  cohorts  are  divided 
into  classes  of  equal  size,  the  predicted  class  size  for  all  classes  in  the  grade  is 

zs  =  V(int((M)/40)+l). 
This  function  is  plotted  in  Figure  2a  for  the  population  Israeli  fifth  graders  in  1991,  along  with  actual  fifth 
grade  class  sizes.  The  x-axis  shows  September  enrollment  and  the  y-axis  shows  either  predicted  class  size  or 
the  average  actual  class  size  in  all  schools  with  that  enrollment.  Maimonides'  Rule  does  not  predict  actual 


MGoIdberger  (1972)  discusses  a  similar  idea  in  the  context  of  compensatory  education  programs. 

''Campbell's  (1969)  discussion  of  RD  focused  mostly  on  what  he  called  a  "sharp  design",  where  the  regressor  of 
interest  is  a  discontinuous  but  deterministic  function  of  another  variable.  In  the  sharp  design  there  is  no  need  to 
instrument  --  the  regressor  of  interest  is  entered  directly.  This  is  in  contrast  with  what  Campbell  called  a  "fuzzy 
design",  where  the  function  is  not  deterministic.  Campbell  did  not  propose  an  estimator  for  the  fuzzy  design,  though 
his  student  Trochim  (1984)  developed  an  IV-like  procedure  for  that  case.  The  discussion  here  covers  the  fuzzy 
design  only  since  the  sharp  design  can  be  viewed  as  a  special  case. 
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class  size  perfectly  because  other  factors  affect  class  size  as  well,  but  average  class  sizes  clearly  display  a 
sawtooth  pattern  induced  by  the  Rule. 

In  addition  to  exhibiting  a  strong  association  with  average  class  size,  Maimonides'  Rule  is  also 
correlated  with  average  test  scores.  This  is  shown  in  Figure  2b,  which  plots  average  reading  test  scores  and 
average  values  of  zs  by  enrollment  size,  in  enrollment  intervals  of  10.  The  figure  shows  that  test  scores  are 
generally  higher  in  schools  with  larger  enrollments  and,  therefore,  larger  predicted  class  sizes.  Most 
importantly,  however,  average  scores  by  enrollment  size  exhibit  a  sawtooth  pattern  that  is,  at  least  in  part,  the 
mirror  image  of  the  class  size  function.  This  is  especially  clear  in  Figure  2c,  which  plots  average  scores  by 
enrollment  after  running  auxiliary  regressions  to  remove  a  linear  trend  in  enrollment  and  the  effects  of  pupils' 
socioeconomic  background.22  RD  interprets  the  up  and  down  pattern  in  the  conditional  expectation  of  test 
scores  given  enrollment  as  reflecting  the  causal  effect  of  changes  in  class  size  that  are  induced  by  exogenous 
changes  in  enrollment.  This  interpretation  is  plausible  because  Maimonides'  Rule  is  known  to  have  this 
pattern,  while  it  seems  likely  that  other  mechanisms  linking  enrollment  and  test  scores  will  be  smoother. 

Figure  2b  makes  it  clear  that  Maimonides'  Rule  is  not  a  valid  instrument  for  class  size  without 
controlling  for  enrollment  because  predicted  class  size  increases  with  enrollment  and  test  scores  increase  with 
enrollment.  The  RD  idea  is  to  use  the  discontinuities  (jumps)  in  predicted  class  size  to  estimate  the  effect  of 
interest  while  controlling  for  smooth  enrollment  effects.  Angrist  and  Lavy  implement  this  by  using  zs  as  an 
instrument  while  controlling  for  smooth  effects  of  enrollment  using  parametric  enrollment  trends.  Consider 
a  causal  model  that  connects  the  score  of  pupil  i  in  school  s  with  class  size  plus  effects  of  the  variable  used  to 
construct  Maimonides'  Rule: 
(27)  yis=Xs'P  +  njs6  +  eis, 

where  nis  is  the  size  of  i's  class,  and  X  ,is  a  vector  of  school  characteristics,  including  functions  of  grade 
enrollment,  b%.  As  before,  we  imagine  that  this  function  tells  us  what  test  scores  would  be  if  class  size  were 


"The  figure  plots  the  residuals  from  regressions  of  yis  and  z,  on  bs  and  the  proportion  of  low-income  pupils  in  the 
school. 
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manipulated  to  be  other  than  the  observed  size,  ni5.  The  first-stage  equation  for  2SLS  estimation  of  (27)  is 
(28)  nis  =  Xs'7Co'+zs7i,  +  vis. 

A  simple  example  is  a  model  that  simply  includes  bs  linearly  to  control  for  enrollment  effects  not  attributable 
to  changing  class  size,  along  with  a  regressor  measuring  the  proportion  of  low-income  students  in  the  school.23 
The  resulting  2SLS  estimate  of  6  in  standard  deviation  units  is  -.037  (with  a  standard  error  of  .009),  meaning 
just  over  a  one-third  standard  deviation  decline  in  test  scores  for  a  10  pupil  increase  in  class  size. 

Since  RD  is  an  IV  estimator,  we  do  not  have  a  separate  section  for  pitfalls.  As  before,  the  most 
important  issue  is  instrument  validity  and  the  choice  of  control  variables.  The  choice  of  controls  is  even  more 
important  in  RD  than  conventional  IV,  however,  since  the  instrument  is  actually  a  function  of  one  of  the 
control  variables.  In  the  Angrist  and  Lavy  application,  for  example,  identification  of  6  clearly  turns  on  the 
ability  to  distinguish  zs  from  Xs  since  zs  does  not  vary  within  schools.  This  suggests  that  RD  depends  more  on 
functional  form  assumptions  than  other  IV  procedures,  though  Hahn,  Todd,  and  van  der  Klauuw  (1998) 
consider  ways  to  weaken  this  dependence. 

2.3  Consequences  of  heterogeneity  and  nonlinearity 

The  discussion  so  far  involves  a  highly  stylized  description  of  the  world,  wherein  causal  effects  are 
the  same  for  everyone,  and,  if  the  causing  variable  takes  on  more  than  two  values,  the  effects  are  linear. 
Although  some  economic  models  can  be  used  to  justify  these  assumptions,  there  is  no  reason  to  believe  this 
is  true  in  general.  On  the  other  hand,  these  strong  assumptions  provide  a  useful  starting  place  because  they 
may  provide  a  good  approximation  of  reality,  and  because  they  focus  attention  on  causality  issues.  If  the 
estimates  of  a  linear,  constant-coefficient  model  are  biased  for  the  causal  effect  of  interest,  then  the  estimates 
are  only  more  difficult  to  interpret  in  a  general  setting. 

The  cost  of  these  simplifying  assumptions  is  that  they  gloss  over  the  fact  that  even  when  a  set  of 


"In  practice,  Angrist  and  Lavy  estimated  (27)  and  (28)  using  class-level  averages  and  not  micro  data. 
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estimates  has  a  causal  interpretation,  they  are  generated  by  variation  for  a  particular  group  of  individuals  over 
a  limited  range  of  variation  in  the  causing  variable.  There  is  a  tradition  in  Psychology  of  distinguishing 
between  the  question  of  internal  validity,  i.e.,  whether  an  empirical  relationship  has  a  causal  interpretation  in 
the  setting  where  it  is  observed,  and  the  question  of  external  validity,  i.e.,  whether  a  set  of  internally  valid 
estimates  has  predictive  value  for  other  groups  or  values  of  the  response  variable  than  those  observed  in  a 
given  study.24  Constant-coefficient  and  linear  models  make  it  harder  to  discuss  the  two  types  of  validity 
separately,  since  external  validity  is  automatic  in  a  constant-coefficients-linear  setting.  Taken  literally,  for 
example,  the  constant-effects  model  says  that  the  economic  consequences  of  military  service  are  the  same  for 
high-school  dropouts  and  college  graduates.  Similarly,  the  linear  model  says  the  economic  value  of  a  year  of 
schooling  is  the  same  whether  the  year  is  second  grade  or  the  last  year  of  college.  We  therefore  discuss  the 
interpretation  of  traditional  estimators  when  constant-effects  and  linearity  assumptions  are  relaxed. 

2.3.1  Regression  and  the  conditional  expectation  function 

Returning  to  the  schooling  example  of  Section  2.2.1,  the  causal  relationship  of  interest  is  fj(S),  which 
describes  the  effect  of  schooling  on  earnings.  In  the  absence  of  any  further  assumptions,  the  average  causal 
response  function  is  E[f|(5)],  with  average  derivative  E[fj'(S)].  Earlier,  we  assumed  ^'(5)  is  equal  to  a  constant, 
p,  in  which  case  averaging  is  not  needed.  In  practice,  however,  the  derivative  may  be  heterogeneous;  that  is, 
it  may  vary  with  i  or  with  i's  characteristics,  X(.  In  economics,  models  for  heterogenous  treatment  effects  are 
commonly  called  "random  coefficient"  models  (see,  e.g.,  Bjorklund  and  Moffitt,  1987  and  Heckman  and 
Robb,  1985  for  discussions  of  such  models).  The  derivative  also  might  be  non-constant  (i.e.,  vary  with  S). 
In  either  case,  it  makes  sense  to  focus  on  the  average  response  function  or  its  average  derivative.  The  principal 
statistical  tool  for  doing  this  is  the  Conditional  Expectation  Function  (CEF)  of  Yj  given  Sit  i.e.,  E[Yjl  Sj=S] 
or  E[Yjl  X;,  S,=5],  viewed  as  a  function  of  S. 


24See,  e.g.,  Campbell  and  Stanley  (1963)  and  Meyer  (1995). 
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To  see  the  connection  between  the  CEF  and  the  average  causal  response,  consider  first  the  difference 
in  average  earnings  between  people  with  S  years  of  schooling  and  people  with  S-l  years  of  schooling: 

E[Y,I  Si=5]-E[Yil  S-S-l] 

=  E[fi{S)-fi(S-l)\Si=S)  +  {E[f,(S-/)E^-E[f,(S-7)GrS-7]}. 
The  first  term  in  this  decomposition  is  the  average  causal  effect  of  going  from  S-l  to  S  years  of  schooling  for 
those  who  actually  have  S  years  of  education.  The  counterfactual  average  E[ft(5-7 )ISi=5"]  is  never  observed, 
however.  The  second  term  reflects  the  fact  that  the  average  earnings  of  those  with  5-7  years  of  schooling  do 
not  necessarily  provide  a  good  answer  to  the  "what  if  question  for  those  with  S  years  of  schooling.  This  term 
is  the  counterpart  of  regression-style  "omitted  variables  bias"  for  this  more  general  model. 

In  this  setting,  the  selection-on-observables  assumption  asserts  that  conditioning  on  a  set  of  observed 
characteristics,  Xj,  serves  to  eliminate  the  omitted  variables  bias  in  naive  comparisons.  That  is, 
(29)  E[f;(5-7)l  X;,  S=S]  =  E[fi(5-7)l  X„  S=S-1]  for  all  S, 

so  that  conditional  on  X,  the  CEF  and  average  causal  response  function  are  the  same: 

E[YiIXi,Si=5]  =  E[fi(5)IXi]. 
In  this  case,  the  conditional-on-X  comparison  does  estimate  the  causal  effect  of  schooling: 

E[Y(I  Xi,  S-SJ-EtYil  Xi,  S-5-7]  =  E[f,(S)-f«(S-/)l  XJ. 
This  is  analogous  to  the  notion  that  adding  X;  to  a  regression  eliminates  omitted  variables  bias  in  OLS 
estimates  of  the  returns  to  schooling. 

The  preceding  discussion  provides  sufficient  conditions  for  the  CEF  to  have  a  causal  interpretation. 
We  next  consider  the  relationship  between  regression  parameters  and  the  CEF.  One  interpretation  of 
regression  is  that  the  population  OLS  slope  vector  provides  the  minimum  mean  squared  error  (MMSE)  linear 
approximation  to  the  CEF.  This  feature  of  regression  is  discussed  in  Goldberger's  (1991)  econometrics  text 
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(see  especially  Section  5.5).25  A  related  property  is  the  fact  that  regression  coefficients  have  an  "average 
derivative"  interpretation.  In  multivariate  regression  models,  however,  this  interpretation  is  complicated  by 
the  fact  that  the  OLS  slope  vector  is  actually  matrix-weighted  average  of  the  gradient  of  the  CEF.  Matrix- 
weighted  averages  are  difficult  to  interpret  except  in  special  cases  (see  Chamberlain  and  Learner,  1976).26 

One  interesting  special  case  where  the  OLS  slope  vector  can  be  readily  interpreted  is  when  there  is 
a  single  regressor  of  interest  and  the  CEF  of  this  regressor  given  all  other  regressors  is  linear,  so  that 

(30)  E[S,I  Xi]=Xi'7T, 

where  n  is  a  conformable  vector  of  coefficients.  This  assumption  is  satisfied  in  the  schooling  regression,  for 
example,  in  a  model  where  all  X-variables  are  discrete  and  the  parameterization  allows  a  separate  effect  for 
each  possible  value  of  X;.  This  is  not  unrealistic  in  applications  with  large  data  sets;  see,  for  example,  Angrist 
and  Krueger  (1991)  and  Angrist  (1998).  In  this  case,  the  population  regression  coefficient  from  a  regression 
of  Yi  on  Xj  and  Sj  can  be  written 

(31)  pr  =  E[(SrE[SiIXi])Y]/E[(Si-E[SiIXi])Sj]  =  E[(Si-E[S,IXi])E[YI  X,,  SJ j/EKSrEftlXdJS,], 
which  is  derived  by  iterating  expectations  over  X(  and  Sj. 

Maintaining  assumption  (30),  i.e.,  that  the  relationship  between  E[Sjl  XJ  is  linear,  first  consider  the 
case  where  E[Yjl  Xj,  Sj]  is  linear  in  S(  but  not  Xj.  Then  we  can  write 

pxHE[YiIXi,Si=5]-E[YiIXi,Si=5-7] 
for  all  5,  which  means 

(32)  E[ YjIXj,  Sj]=  E[Y,i  Xj,  S,=0]  +  S iPx. 

In  other  words,  the  CEF  is  linear  in  schooling,  but  the  schooling  coefficient  is  not  constant  and  depends  on  X;. 


"Proof  that  OLS  gives  the  MMSE  linear  approximation  of  the  CEF:  The  vector  of  population  regression  coefficients 
for  regressor  vector  W;  solves  mmbE(Yj-Wj'b)2.  But  (Yj-W/b)^  [(Yj-EfYjIWj])  +  (EfYjIW,]  -  W/b)]2  and  E[(Y,- 
E[Y,IWJ)  (E[YjlWj]  -  Wj'b)]=0,  so  mmbE([Y,Wj]  -  W/b)]2  has  the  same  solution. 

26The  population  slope  vector  is  E['Wi'Wi']lWWiYi}  =  E[WiWj']1E[WiE(YiIWi)].  Linearizing  the  CEF,  we  have 
E(YilWj)  =  E(YiIWj=0)+W,"VE(Yil  wj,  where  VE(YJ  w()  is  the  gradient  of  the  conditional  expectation  function,  and 
w,  is  a  random  variable  that  lies  between  W,  and  zero.  So  the  slope  vector  is  E[W,Wi']'lE[(W,W,'  )VE(YSI  %)], 
which  is  a  matrix- weighted  average  of  the  gradient  with  weights  (WjW,'). 
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Substituting  (32)  into  (31),  we  have 

(33)  p,  =  E[(Sj-E[SiIXi])2px]/E[(Si-E[SiIXi])2]  =  E[o|0QPx]/E[oKXi)] 

where    Os(Xi)=E[(Sj-E[SjIXi])2l  XJ  is  the  variance  of  S(  given  Xj.    So  in  this  case,  regression  provides  a 

variance-weighted  average  of  the  slope  at  each  Xj.  Values  of  X,  that  get  the  most  weight  are  those  where  the 

conditional  variance  of  schooling  is  largest. 

What  if  the  CEF  varies  with  both  X;  and  S,?  Let 

pJX  h  E[Y,I  Xh  Si=S]-E[Yi!  X,  S=S-1], 

where  the  p5X  notation  reflects  variation  with  both  S  and  Xj.  Then  the  coefficient  on  Sj  in  a  regression  of  Yj 

on  Xj  and  S,  can  be  written 

s  § 

(34)  pr  =  E[  IPjxM5x]/E[I  Ma] 

5=1  5=1 

where 

Msx  =  (E[Sjl  Xj,  S,*S]-E[S,I  Xj,  S,<5])(P[Sj^5l  XJ(1*PES,'*S1  XJ))  *  0. 

and  S  takes  on  values  in  the  set  {0,  1, . . .,  s}.  This  result,  which  is  proved  in  the  appendix,  is  a  generalization 

of  the  formula  forbivariate  regression  coefficients  given  by  Yitzhaki  (1996). 21 

The  weighting  formula  in  (34)  has  a  sum  and  an  expectation.  The  sum  averages  pra  for  all  schooling 

increments,  given  a  particular  value  of  Xj  (this  averaging  matters  if  the  CEF  is  nonlinear).  The  expectation  then 

averages  this  sum  in  the  distribution  of  Xj  (this  averaging  matters  if  the  response  function  is  heterogeneous). 

The  formula  for  the  weights,  u^,  can  be  used  to  characterize  the  OLS  slope  vector.  First,  for  any  particular 

Xj,  weight  is  given  to  p^  for  each  S  in  proportion  to  the  change  in  the  conditional  mean  of  Sj,  as  Ss  falls  above 

or  below  S.  More  weight  is  also  given  to  points  in  the  domain  of  fj(S)  that  are  close  to  the  conditional  median 

of  Sj  given  Xs  since  this  is  where  PIS^SI  Xj](l-P[Sj^5I  Xj])  is  maximized.   Second,  as  in  the  linear  case 

discussed  above,  weight  is  also  given  in  proportion  to  conditional  variance  of  Sj  given  Xj,  except  now  this 


"Yitzhaki  gives  examples  and  describes  the  OLS  weighting  function  for  a  model  with  a  single  continuously 
distributed  regressor  in  detail.  For  Normally  distributed  regressors,  the  weighting  function  is  the  Normal  density 
function,  so  that  OLS  provides  a  density- weighted  average  of  the  sort  discussed  by  Powell,  Stock,  and  Stoker  (1989). 
For  an  alternative  non-parametric  interpretation  of  OLS  coefficients  see  Stoker  (1986). 


40 
variance  is  defined  separately  lor  each  S  using  dummies  for  the  event  that  S^S.   Note  also  that  the  OLS 
estimate  contains  no  information  about  the  returns  to  schooling  for  values  of  Xj  where  PfS^Sl  XJ  equals  0  or 
1.  This  includes  values  of  X;  where  S;  does  not  vary  across  observations,  because  PfS^SI  XJ=1  if  P[Si=5'IXi]=l . 

The  weighting  function  is  illustrated  in  Figure  3  using  data  from  the  1990  Census.  The  top  panel  plots 
an  estimate  of  the  earnings-schooling  CEF,  i.e.,  average  log  weekly  wages  against  years  of  schooling  for  men 
with  8-20  years  of  schooling,  adjusted  for  covariates.  In  other  words,  the  plot  shows  EfEIYjIXj,  SpS] },  plotted 
against  S.  Years  of  schooling  are  not  recorded  in  the  1990  Census  and  were  therefore  imputed  from  categorical 
schooling  variables  as  described  in  the  appendix.  The  X-variables  are  race  (white,  nonwhite),  age  (40-49),  and 
state  of  birth.  The  covariates  in  this  case  are  similar  to  those  used  in  some  of  the  specifications  in  the  Angrist 
and  Krueger  (1991)  study  of  the  returns  to  schooling,  although  the  data  underlying  this  figure  are  more  recent. 

The  dotted  line  in  the  figure  plots  the  change  in  EfEfYilXj,  Sj=S]}  with  S.  This  is  the  covariate- 
adjusted  difference  in  average  log  weekly  wages  at  each  schooling  increment, 

p5  e  E{E[Y,IX„  S=S]  -  EWX,  SFS-1])  =  £x  psx  P(X~X) 
For  example,  the  first  point  on  the  dotted  line  is  an  estimate  of  p9-p8  ,  which  is  the  average  difference  in 
earnings  between  those  with  9  years  of  schooling  and  those  with  8  years  of  schooling,  adjusting  for  differences 
in  the  distribution  of  X;  between  the  two  schooling  groups.28  The  returns  measured  in  this  way  are  remarkably 
stable  until  13  years  of  schooling,  but  quite  variable  after  that  and  sometimes  even  negative. 

The  more  lightly  shaded  line  in  the  figure  is  the  OLS  regression  line  obtained  from  fitting  equation 
(1)  with  a  saturated  model  for  X(  (in  other  words,  the  model  includes  a  full  set  of  dummies  da,  which  equal 
one  when  X=X  for  every  value  X;  the  OLS  estimate  of  p  in  this  case  is  .094).  This  parameterization  satisfies 
assumption  (30),  i.e.,  E[SJ  XJ  is  linear.  The  figure  illustrates  the  sense  in  which  OLS  captures  the  average 
return.    The    OLS  weighting  function  for  each  value  of  S;  is  plotted  in  the  lower  panel,  along  with  the 


28The  unadjusted  difference  in  average  wages  is  (E[Y,ISj=S  J-EfYilSpS-7]},  which  equals  {E[E(YiIXi,Si=5)l  S~  5-1]- 
EtECYJXi.S^S-;)!  Si=  5-7]. 
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histogram  of  schooling.29  Like  the  distribution  of  schooling  itself,  the  OLS  weighting  scheme  puts  the  most 
weight  on  value  between  12-16.  It  is  interesting  to  note,  however,  that  while  the  histogram  of  schooling  is 
bimodal,  the  weighting  function  is  smoother  and  unimodal.  Moreover,  the  population  average  of  ps,  i.e.,  the 
weighted  average  of  the  covariate-adjusted  return  using  the  schooling  histogram,  £s  psP(Si=S),  is  .144,  which 
is  considerably  larger  than  the  OLS  estimate.  This  is  because  about  half  of  the  sample  has  12-13  years  of 
schooling,  where  the  returns  are  .136  and  .148.  The  OLS  weighting  function  gives  more  weight  than  the 
histogram  to  other  schooling  values,  like  14,  15,  and  17,  where  the  returns  are  small  and  even  negative. 

2.3.2  Matching  instead  of  regression 

The  previous  section  shows  how  regression  produces  a  weighted  average  of  covariate-specific  effects 
for  each  value  of  the  causing  variable.  The  empirical  consequences  of  the  OLS  weighting  scheme  in  any 
particular  application  depend  on  the  distribution  of  regressors  and  the  amount  of  heterogeneity  in  the  causal 
effect  of  interest.  Matching  methods  provide  an  alternative  estimation  strategy  that  affords  more  control  over 
the  weighting  scheme  used  to  produce  average  causal  effects.  Matching  methods  also  have  the  advantage  of 
making  the  comparisons  that  are  used  for  statistical  identification  transparent.  Matching  is  most  practical  in 
cases  where  the  causing  variable  takes  on  two  values,  as  in  the  union  status  and  military  service  examples 
discussed  previously. 

Again,  we  use  the  example  of  estimating  the  effect  of  military  service  to  illustrate  this  technique. 
Angrist  (1998)  reported  matching  and  regression  to  estimate  the  effects  of  voluntary  military  service  on  civilian 
earnings.  As  in  the  Vietnam  study,  the  potential  outcomes  are  Y^,  denoting  what  someone  would  earn  if  they 
did  not  serve  in  the  military,  and  Y,i  denoting  earnings  as  a  veteran.  Since  Yjj-Ya  is  not  constant,  and  we  never 
observe  both  potential  outcomes  for  any  one  person,  it  makes  sense  to  focus  on  average  effects.  One 
possibility  is  the  "average  treatment  effect,"  EfY^-Yo;],  but  this  is  not  usually  the  first  choice  in  studies  of  this 


"Since  the  regression  model  has  covariates,  the  weights  vary  with  Xj  as  well  as  for  each  schooling  increment.    The 
average  weighting  function  plotted  in  the  figure  is  £x  uJXP(X~X). 
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kind  since  people  who  serve  in  the  military  tend  to  have  personal  characteristics  that  differ,  on  average,  from 
those  of  people  who  didn't  serve.  The  manpower  policy  innovations  that  are  typically  contemplated  affect 
those  individuals  who  either  would  now  serve  or  who  might  be  expected  to  serve  in  the  future.  For  example, 
between  1989  and  1992,  the  size  of  the  military  declined  sharply  because  of  increasing  enlistment  standards. 
Policy  makers  would  like  to  know  whether  the  people  who  would  have  served  under  the  old  rules  but  are 
unable  to  enlist  under  the  new  rules  were  hurt  by  the  lost  opportunity  for  service.  This  sort  of  reasoning  leads 
researchers  to  try  to  estimate  the  "effect  of  treatment  on  the  treated,"  which  is  EIY^-Y^  D,=l  ]  in  our  notation.30 

As  in  the  study  of  Vietnam  veterans,  simply  comparing  the  earnings  of  veterans  and  nonveterans  is 
unlikely  to  provide  a  good  estimate  of  the  effect  of  military  service  on  veterans.  The  comparison  by  veteran 
status  is 

E[Y„|  D,=l]  -  E[Ya|  D,=0]  =  E[Y„  -  Y„|  D-l]  +  {E[Y„|  Di=l]  -  E[YJ  D,=0]}. 
This  is  the  average  causal  effect  of  military  service  on  veterans,  E[Y,  -  Y0|  D=l],  plus  a  bias  term  attributable 
to  the  fact  that  the  earnings  of  nonveterans  are  not  necessarily  representative  of  what  veterans  would  have 
earned  had  they  not  served  in  the  military.  For  example,  veterans  may  have  higher  earnings  simply  because 
they  must  have  higher  test  scores  and  be  high  school  graduates  to  meet  military  screening  rules. 

The  bias  term  in  naive  comparisons  goes  away  if  D;  is  randomly  assigned  because  then  D  jWill  be 
independent  of  Ya  and  Yn.  Since  voluntary  military  service  is  not  randomly  assigned  (and  there  is  no  longer 
a  draft  lottery),  Angrist  (1998)  used  matching  and  regression  techniques  to  control  for  observed  differences 
between  veteran  and  nonveterans  who  applied  to  get  into  the  all-volunteer  forces  between  1979  and  1982.  The 
motivation  for  a  control  strategy  in  this  case  is  the  fact  that  the  military  screens  applicants  to  the  armed  forces 
primarily  on  the  basis  of  age,  schooling,  and  test  scores,  characteristics  that  are  observed  in  the  Angrist  (1998) 
data.  Identification  in  this  case  is  based  on  the  claim  that  after  conditioning  on  all  of  the  observed 
characteristics  that  are  known  to  affect  veteran  status,  veterans  and  nonveterans  are  comparable  in  the  sense 


"Heckman  and  Robb  (1985)  make  this  point  about  the  effect  of  subsidized  training  programs. 
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that 

(35)  E[YJ  X,,  Di=l]=E[Y0il  Xit  D,=0]. 

This  assumption  seems  plausible  for  two  reasons.  First,  the  nonveterans  who  provide  observations  on  Y^  did 
in  fact  apply  to  get  in  to  the  military.  Second,  selection  for  military  service  from  the  pool  of  applicants  is  based 
almost  entirely  on  variables  that  are  observed  and  included  in  the  X-variables.  Variation  in  veteran  status 
conditional  on  X,  comes  solely  from  the  fact  that  some  qualified  applicants  nevertheless  fail  to  enlist  at  the  last 
minute.  Of  course,  the  considerations  that  lead  a  qualified  applicant  to  'drop  out"  of  the  enlistment  process 
could  be  related  to  earnings  potential,  so  assumption  (35)  is  clearly  not  guaranteed. 

Given  assumption  (35),  the  effect  of  treatment  on  the  treated  can  be  constructed  as  follows: 

(36)  ErVYjDpl]  =E{E[Yli|Xi,Di=l]-E[Y0l|Xi,D,=l]|D,=l} 

=  E{E[Y„|  X„D,=l]-E[Ya|  X.D-011  D,=l }  =E[8X|  D,=l]. 
where 

6x  =  E[Yi|X„D,=l]-E[Yi|Xi,D,=0]. 
Here  6X  is  a  random  variable  that  represents  the  set  of  differences  in  mean  earnings  by  veteran  status 
corresponding  to  each  value  taken  on  by  Xj.   This  is  analogous  to  px  that  was  defined  for  the  schooling 
problem.  Note,  however,  that  since  D|  is  binary,  the  response  function  is  automatically  linear  in  Df. 

The  matching  estimator  in  Angrist  (1998)  uses  the  fact  that  X,  is  discrete  to  construct  the  sample 
analog  of  (36),  which  can  also  be  written 

(37)  E[Y„  -  Y„|  Dpi]  =  lx  6X  P(XFXI  Di=l), 

where  P(Xi=Xl  D=l)  is  the  probability  mass  function  for  Xt  given  D~l  and  the  summation  is  over  the  values 
of  X;.31  In  this  case,  X;,  takes  on  values  determined  by  all  possible  combinations  of  year  of  birth,  AFQT  test- 
score  group,32  year  of  application  to  the  military,  and  educational  attainment  at  the  time  of  application. 


3lThis  matching  estimator  is  discussed  by  Rubin  (1977)  and  used  by  Card  and  Sullivan  (1988)  to  estimate  the  effect 
of  subsidized  training  on  employment. 

"This  is  the  Armed  Forces  Qualification  Test,  used  by  the  military  to  screen  applicants. 
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Naive  comparisons  clearly  overestimate  the  benefit  of  military  service.  This  can  be  seen  in  Table  6, 
which  reports  differences-in-means,  matching,  and  regression  estimates  of  the  effect  voluntary  military  service 
on  the  1988-91  Social  Security-taxable  earnings  of  men  who  applied  to  join  the  military  between  1979  and 
1982.  The  matching  estimates  were  constructed  from  the  sample  analog  of  (37),  i.e.,  from  covariate-value- 
specific  differences  in  earnings,  6X,  weighted  to  form  a  single  estimate  using  the  distribution  of  covariates 
among  veterans.  Although  white  veterans  earn  $1,233  more  than  nonveterans,  this  difference  after  becomes 
negative  once  the  adjustment  for  differences  in  covariates  is  made.  Similarly,  while  non-white 
veterans  earn  $2,449  more  than  nonveterans,  controlling  for  covariates  reduces  this  to  $840. 

Table  6  also  shows  regression  estimates  of  the  effect  of  voluntary  service,  controlling  for  exactly  the 
same  covariates  used  in  the  matching  estimates.  These  are  estimates  of  6,  in  the  equation 

(38)  V,  =  Lrd«P*  +  6A  +  ei. 

where  px  is  a  regression-effect  for  X=X  and  6,  is  the  regression  parameter.  This  corresponds  to  a  saturated 
model  for  X,.  Despite  the  fact  that  the  matching  and  regression  estimates  control  for  the  same  variables,  the 
regression  estimates  are  significantly  larger  than  the  matching  estimates  for  both  whites  and  nonwhites.33 
The  reason  the  regression  estimates  are  larger  than  the  matching  estimates  is  that  the  two  estimation 
strategies  use  different  weighting  schemes.  While  the  matching  estimator  combines  covariate-value-specific 
estimates,  6*,  to  produces  an  estimate  of  the  effect  of  treatment  on  the  treated,  regression  produces  a  variance- 
weighted  average  of  these  effects.  To  see  this,  note  that  since  Ds  is  binary  and  EfDJ  XJ  is  linear,  formula  (33) 
from  the  previous  section  implies 

6,  =  E[(DrE[DiIXi])26x]/E[(Di-E[DiIXi])2]  =  E[ol&d?>xW[ol&d] 
But  in  this  case,  o£(Xj)=  P(Dj=ll  Xftl-P(Dpll  X;)),  so 

lx  6*  [PPplI  X^XXl-PO^ll  X^X))]P(Xi=*) 


6r  = 


I*  [P(Di=ll  Xi=X)(l-P(Di=ll  X=X))]?(X,=X) 


"The  formula  for  the  covariance  of  regression  and  matching  estimates  is  derived  in  Angrist  (1998,  p.  274). 
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In  other  words,  regression  weights  each  covariate-specific  treatment  effect  by  P(X,=X1  Dj=l)(l-P(X-=.XI  D,=l)). 

In  contrast,  the  matching  estimator,  (37),  can  be  written 

lx6xP(Dr\\X~X)?(X~X) 
HEY,,  -  Y«|  DF1]  =        


ZxPiD^WX-^PQi-X), 
because  P(X~X\  D;=l)  =  P(D,=1I  XpWXpXyP^j). 

The  weights  underlying  E[Yn  -  YJ  D,=l]  are  proportional  to  the  probability  of  veteran  status  at  each 
value  of  the  covariates.  So  the  men  most  like  to  serve  get  the  most  weight  in  estimates  of  the  effect  of 
treatment  on  the  treated.  In  contrast,  regression  estimation  weights  each  of  the  underlying  treatment  effects 
by  the  conditional  variance  of  treatment  status,  which  in  this  case  is  maximized  when  P(D(=1 1  X=X)=Vi.  Of 
course,  the  difference  in  weighting  schemes  is  of  no  importance  if  the  effect  of  interest  does  not  vary  with  Xi. 
But  Figure  4,  which  plots  X-specific  estimates  (6X)  of  the  effect  of  veteran  status  on  average  1988-91  earnings 
against  P[D =1 1  X~X],  shows  that  the  men  who  were  most  likely  to  serve  in  the  military  benefit  least  from  their 
service.  This  fact  leads  matching  estimates  of  the  effect  of  military  service  to  be  smaller  than  regression 
estimates  based  on  the  same  vector  of  controls. 

2.3.3  Matching  using  the  propensity  score 

It  is  easy  to  construct  a  matching  estimator  based  on  (37)  when,  as  in  Angrist  (1998),  the  conditioning 
variables  are  discrete  and  the  sample  has  many  observations  at  almost  every  set  of  values  taken  on  by  the  vector 
of  explanatory  variables.  What  about  situations  where  X;  is  continuous,  so  that  exact  matching  is  not  practical? 
Problems  involving  more  finely  distributed  X-variables  are  often  solved  by  aggregating  values  to  make  coarser 
groupings  or  by  pairing  observations  that  have  similar,  though  not  necessarily  identical  values.  See  Cochran 
(1965),  Rubin  (1973),  or  Rosenbaum  (1995,  Chapter  3)  for  discussions  of  this  approach.  More  recently, 
Deaton  and  Paxson  (1998)  used  nonparametric  methods  to  accommodate  continuous-valued  control  variables 
in  a  matching  estimator. 
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The  problem  of  how  to  aggregate  the  X-variables  also  motivates  a  matching  method  first  developed 
in  a  series  of  papers  by  Rosenbaum  and  Rubin  (1983,  1984,  1985).  These  papers  show  that  full  control  for 
observed  covariates  can  be  obtained  by  controlling  solely  for  a  particular  function  of  X,  called  the  propensity 
score,  which  is  simply  the  conditional  probability  of  treatment,  p(Xi)=P(Di=l  |  X,).  The  formal  result 
underlying  this  approach  says  that  if  conditioning  on  X,  eliminates  selection  bias, 

E[YJXilDi=l]  =  E[YJXi,Di=0] 
then  it  must  also  be  true  that  conditioning  on  p(X;)  eliminates  selection  bias: 

E[YJ  pCX,),  D,=l]  =  EtYo-l  p(X,),  D,=0]. 
This  leads  to  the  following  modification  of  (36): 

EtY.i-YoilDpl]  =  E{E[Yli|X,,D1=l]-E[YjX,,D,=l]|DFl} 

=  E{E[Y„|  p(Xi),D,=l]-E[Yjp(X,),D,=0]|  Dj=l } 
Of  course,  to  make  this  expression  into  an  estimator,  the  propensity  score  p(X|)  must  first  be  estimated.  The 
practical  value  of  this  result  is  that  in  some  cases,  it  may  be  easier  to  estimate  p(Xj)  and  then  condition  on  the 
estimates  of  p(X,)  than  to  condition  on  X;  directly.  For  example,  even  if  Xj  is  continuous,  p(X()  may  have  some 
"flat  spots",  or  we  may  have  some  prior  information  about  p(X;).  The  propensity  score  approach  is  also 
conceptually  appealing  because  it  focuses  attention  on  variables  that  are  related  to  the  regressor  of  interest. 
Although  Yj  may  vary  with  X,  in  complicated  ways,  this  is  only  of  concern  for  values  of  Xj  where  p(Xj)  varies 
as  well. 

An  example  using  the  propensity  score  in  labor  economics  is  Dehejia  and  Wahba's  (1995)  reanalysis 
of  the  National  Supported  Work  (NSW)  training  program  studied  by  Lalonde  (1986).  The  NSW  provided 
training  to  different  groups  of  "hard-to-employ"  men  and  women  in  a  randomized  demonstration  project. 
Lalonde' s  study  uses  observational  control  groups  from  the  Current  Population  Survey  (CPS)  and  the  Panel 
Study  of  Income  Dynamics  (PSID)  to  look  at  whether  econometric  methods  are  likely  to  generate  conclusions 
similar  to  those  found  in  the  experimental  study.    One  hurdle  facing  the  non-experimental  investigator 
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attempting  to  construct  a  control  group  for  trainees  is  how  to  control  for  lagged  earnings.  As  we  noted  earlier, 
controlling  for  lagged  earnings  is  important  since  participants  in  government  training  programs  are  often 
observed  to  experience  a  decline  in  earnings  before  entering  the  program  (see,  e.g.,  Ashenfelter  and  Card, 
1985,  and  the  Heckman,  Lalonde,  and  Smith  chapter  on  training  in  this  volume). 

Lalonde  (1986)  found  that  non-experimental  methods  based  on  regression  models,  including  models 
with  fixed  effects  and  control  for  lagged  earnings,  fail  to  replicate  the  NSW  experimental  findings.  Using  the 
same  observational  control  groups  as  Lalonde  (1986)  did,  Dehejia  and  Wahba  (1995)  control  for  lagged 
earnings  and  other  covariates  by  first  estimating  a  logit  model  that  relates  participation  in  the  program  to  the 
covariates  and  two  lags  of  earnings.  Following  an  example  by  Rosenbaum  and  Rubin  (1984),  they  then  divide 
the  sample  into  quintiles  on  the  basis  of  fitted  values  from  this  logit,  i.e.,  based  on  estimates  of  the  propensity 
score.  The  overall  estimate  of  the  effect  of  treatment  on  the  treated  is  the  difference  between  average  trainee 
and  average  control  earnings  in  each  quintile,  weighted  by  the  number  of  trainees  in  the  quintile.  The 
estimates  produced  using  this  method  are  similar  to  those  based  on  the  experimental  random  assignment  (and 
apparently  more  reliable  than  regression  estimates).  It  should  be  clear,  however,  that  use  of  propensity  score 
methods  requires  a  number  of  decisions  about  how  to  model  and  control  for  the  score.  There  is  little  in  the 
way  of  formal  statistical  theory  to  guide  this  process,  and  the  question  of  whether  propensity  score  methods 
are  better  than  other  methods  remains  open.  See  Heckman,  Ichimura,  and  Todd  (1997)  for  further  empirical 
evidence,  and  Hahn  (1998)  for  recent  theoretical  results  on  efficiency  considerations  in  these  models. 

2.3.4.  Interpreting  instrumental  variables  estimates 

The  discussion  of  IV  in  Section  2.2.3  used  the  example  of  veteran  status,  with  two  potential  outcomes 
and  a  constant  causal  effect,  Yn  -Y^  =  6.  What  is  the  interpretation  of  an  IV  estimate  when  constant-effects 
assumption  is  relaxed?  We  first  discuss  this  for  a  model  where  the  causing  variable  is  binary,  as  in  the  veteran 
status  example,  turning  afterwards  to  a  more  general  model.  As  before,  the  discussion  is  initially  limited  to 
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the  Wald  estimator  since  this  is  an  important  and  easily-analyzed  IV  estimator. 

Without  the  constant-effects  assumption,  we  can  write  the  observed  outcome,  Yj,  in  terms  of  potential 
outcomes  as 

(39)  Y-,  =  Yi0  +  (Y.rYJDi  =  p0  +  6A  +  ^ 

where  P0=E[Yjo]  and  6j  sY^Y^  is  the  heterogeneous  causal  effect.  The  expression  after  the  second  equals 
sign  is  a  "random-coefficients"  version  of  the  causal  model  in  Section  2.3.3  (see  equation  23).  To  facilitate 
the  discussion  of  IV,  we  also  introduce  some  notation  for  the  first-stage  relationship  between  the  causing 
variable,  D„  and  the  binary  instrument,  Zj.  To  allow  for  as  much  heterogeneity  as  possible,  the  first  stage 
equation  is  written  in  a  manner  similar  to  (39): 

(40)  Dj  =  Di0  +  (D.rDJZi  =  tt0  +  tc.jZj  +  v„ 

where  7i0=E[Dl0]  and  ^^(Dn-Doi)  is  the  causal  effect  of  the  instrument  on  Dj.  In  the  draft  lottery  example, 
D0i  tells  us  whether  i  would  serve  in  the  military  if  not  draft-eligible  and  D  utells  us  whether  i  would  serve 
when  draft-eligible.  The  effect  of  draft-eligibility  on  D,  is  the  difference  between  these  two  potential  treatment 
assignments. 

The  principle  identifying  assumption  in  this  setup  is  that  the  vector  of  potential  outcomes  and  potential 
treatment  assignments  is  jointly  independent  of  the  instrument.  Formally, 
{Yli,Y0i,Dli,D0i}IlZi, 

where  "]}"  is  notation  for  statistical  independence  (see,  e.g.,  Dawid,  1979,  or  Rosenbaum  and  Rubin,  1983).34 
In  the  lottery  example,  Z,  is  clearly  independent  of  {D^,  D,j}  since  Zj  was  randomly  assigned.  As  noted  in 
section  2.3.3,  however,  independence  of  { Y^,  Y,;}  and  Z,  is  not  guaranteed  by  randomization  since  Ya  and 
Y,j  refer  to  potential  outcomes  under  alternative  assignments  of  veteran  status  and  not  Z,  itself.  Even  though 
Zj  was  randomly  assigned,  so  the  relationship  between  Zj  and  Y{  is  clearly  causal,  in  principle  there  might  be 
reasons  other  than  veteran  status  for  an  effect  of  draft-eligibility  on  earnings.  The  independence  assumption, 


3<The  independence  assumption  using  random-coefficients  notation  is  {b„  r^,  Tt,j,  vs}  JJ  Zj. 
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which  is  similar  to  the  assumption  that  Z,  and  rj;  are  uncorrelated  in  the  constant-effects  model,  rules  this 
possibility  out. 

A  second  assumption  that  is  useful  here,  and  one  that  does  not  arise  in  a  constant-effects  setting,  is  that 
either  itu  *0  for  all  i  or  7t,i  <0  for  all  i.  This  monotonicity  assumption,  introduced  by  Imbens  and  Angrist 
(1994),  means  that  while  the  instrument  may  have  no  effect  on  some  people,  it  must  be  the  case  that  the 
instrument  acts  in  only  one  direction,  either  D^Dq;  or  D,;<D0  jfor  all  i.  In  what  follows,  we  assume  D^Dq, 
for  all  i.  In  the  draft-lottery  example,  this  means  that  although  draft-eligibility  may  have  had  no  effect  on  the 
probability  of  military  service  for  some  men,  there  is  no  one  who  was  actually  kept  out  of  the  military  by  being 
draft-eligible.  Without  monotonicity,  instrumental  variables  estimators  are  not  guaranteed  to  estimate  a 
weighted  average  of  the  underlying  causal  effects,  Y^Y^. 

Given  independence  and  monotonicity,  the  Wald  estimator  in  this  example  can  be  interpreted  as  the 

effect  of  veteran  status  on  those  whose  treatment  status  was  changed  by  the  instrument.  This  parameter  is 

called  the  local  average  treatment  effect  (LATE;  Imbens  and  Angrist,  1994),  and  can  be  written  as  follows: 

E[Y,IZpl]-E[Y,IZi=0] 


=    E[Yli-Yo,IDli>DOi]  =  E[61l7ili>0]. 


EtD.IZ-U-EtDilZ-O] 
Thus,  IV  estimates  of  effects  of  military  service  using  the  draft  lottery  estimate  the  effect  of  military  service 
on  men  who  served  because  they  were  draft-eligible,  but  would  not  otherwise  have  served.35  This  obviously 
excludes  volunteers  and  men  who  were  exempted  from  military  service  for  medical  reasons,  but  it  includes 
men  for  whom  the  draft  policy  was  binding.  Much  of  the  debate  over  compulsory  military  service  focused  on 
draftees,  so  LATE  is  clearly  a  parameter  of  policy  interest  in  the  Vietnam  context. 

The  LATE  parameter  can  be  linked  to  the  parameters  in  traditional  econometric  models  for  causal 
effects.  One  commonly  used  specification  for  dummy  endogenous  regressors  like  veteran  status  is  a  latent- 


"Proof  of  the  LATE  result:  E[Yjl  Zj=l]=E[Yi0  +  (Y.j-YJDJ  Z-l],  which  equals  E[Yjo  +  (Y.j-YJDJ  by 
independence.  Likewise  E[YJ  Z,=0]=  E[Yj„  +  (Y^Y^D,,;  ],  so  the  numerator  of  the  Wald  estimator  is 
E[(Yli-Y0j)(DIi-D0i)].  Monotonicity  means  D^-Da  equals  one  or  zero,  so  E[(Yli-Y0j)(Dli-D0i)]= 
EfY.i-YJD^DoJPID.^DJ.  A  similar  argument  shows  E[DJ  Z-11-ErDjl  Z,=0]  =  E[Dll-D0i]=P[Dli>D0i]. 
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index  model  (see,  e.g.,  Heckman,  1978),  where 

Dpi  if  Yo  +  Yi^i  >  v  and  °  otherwise, 
and  v;  is  a  random  factor  assumed  to  be  independent  of  Zj.  This  specification  can  be  motivated  by  comparisons 
of  utilities  and  costs  under  alternative  choices.    In  the  notation  of  equation  (40),  the  latent-index  model 
characterizes  potential  treatment  assignments  as: 

Doi=l  if  [Yo  >  vj  and  Dn=l  if  [y0  +  Yi  >  vj. 
Note  that  in  this  model,  monotonicity  is  automatically  satisfied  since  Yi  is  a  constant.  Assuming  Yi>0. 

EtY.-YJ  D^DoJ  =  EtY.rYJ  Yo  +  Y,  >  V;  >y0], 
which  is  a  function  of  the  structural  first-stage  parameters,  Yo  and  Yi-  The  LATE  parameter  is  representative 
of  a  larger  group  the  larger  is  the  first-stage  parameter,  y,. 

LATE  can  also  be  compared  with  the  effect  of  treatment  on  the  treated  for  this  problem,  which 
depends  on  the  same  first-stage  parameters  and  the  marginal  distribution  of  Zr  Note  that  in  the  latent-index 
specification,  Dpi  in  one  of  two  ways:  either  Yo>vp  Jn  which  case  the  instrument  doesn't  matter,  or  y0  +  y, 
>  Vj  >Yo  and  Z—l.  Since  these  two  possibilities  partition  the  group  with  Dpi,  we  can  write 

ErY.rYo.lDpl^PtD-l)-'  x 

{  EtY.rYJ  Yo  +  Y.  >  v,  >Yo.  Zi=l]  PCYo+Y.^Yo.  Z,=l)  +  EtY.i-YJ  Yo  >  vJPCYo^)  } 

=  P(D,=1)-'  x  {EIYn-Yo,!  Yo  +  Y.  >  v>  >y0]  P(Yo+Y.>vi>Yo)P(Zi=D  +  E[YirYJ  y0  >  vJPCy^v^  }. 
This  shows  that  the  effect  on  the  treated  is  a  weighted  average  of  LATE  and  the  effect  on  men  whose  treatment 
status  is  unaffected  by  the  instrument.36  Note,  however,  that  although  LATE  equals  the  Wald  estimator,  the 
effect  on  the  treated  is  not  identified  in  this  case  without  additional  assumptions  (see,  e.g.,  Angrist  and  Imbens, 
1991). 


"Note  that  P[Yo  +  Yi  >  \  >YoMZi=l]+P[Yo  >  v^E^I  Z=l]  -  EfDJ  Zi=0])P(Zi=l)+E[Dil  Z~0]=P[D~1],  so  the 
weights  sum  to  one.  In  the  special  case  where  P[y0  >vf  ]=0  for  everyone,  LATE  and  the  effect  of  treatment  on  the 

treated  are  the  same. 
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Interpreting  IV  estimates  with  cardinal  variables 

So  far  the  discussion  of  IV  has  focused  on  models  with  a  binary  regressor.  What  does  the  Wald 
estimator  estimate  when  the  regressor  takes  on  more  than  two  values,  like  schooling?  As  in  the  discussion  of 
regression  in  Section  2.2.1,  suppose  the  causal  relationship  of  interest  is  characterized  by  a  function  that 
describes  exactly  what  a  given  individual  would  earn  if  they  obtained  different  levels  of  education.  This 
relationship  is  person-specific,  so  we  write  ft(S)  to  denote  the  earnings  or  wage  that  i  would  receive  after 
obtaining  S  years  of  education.  The  observed  earnings  level  is  Yi=fi(Sj). 

Again,  it  is  useful  to  have  a  general  notation  for  the  first-stage  relationship  between  S;  and  Z,: 
(41)  S,  =  So,  +  (S.rSJZ,  =  4>0  +  <!>,£  +  v„ 

where  S0,  is  the  schooling  i  would  get  if  Z;=0,  S,,  is  the  schooling  i  would  get  if  Z;=l,  and  (J^EtSoJ.  In 
random-coefficients  notation,  the  causal  effect  of  Z,  on  S,  is  (})lisS,j-S0].  To  make  this  concrete,  suppose  the 
instrument  is  a  dummy  for  being  born  in  the  second,  third,  or  fourth  quarter  of  the  year,  as  for  the  Wald 
estimate  in  Angrist  and  Krueger  (1991,  Table  3).  Since  compulsory  attendance  laws  allow  people  to  drop  out 
of  school  on  their  birthday  (typically  the  16th)  and  most  children  enter  school  in  September  of  the  year  they 
turn  6,  pupils  bom  later  in  the  year  are  kept  in  school  longer  than  those  bom  earlier.  In  this  example,  Si0  is  the 
schooling  i  would  get  if  bom  in  the  first  quarter  and  S;,  is  the  schooling  i  would  get  if  bom  in  a  later  quarter. 

Now  the  independence  assumption  is  {f,(S),  Su,  S^}  ]}  Z,  and  the  monotonicity  assumption  is  S^sSqj. 
This  means  the  instrument  is  independent  of  what  an  individual  could  earn  with  schooling  level  S,  and 
independent  of  the  random  elements  in  the  first  stage.37  Using  the  independence  assumption  and  equation  (41) 
to  substitute  for  Sjt  the  Wald  estimator  can  be  written: 

E[f,(S,)l  Z,=l  ]  -  EtfA)!  Zi=0]  ElftSuMKSa)] 

(42) =  =    EtoKfM-ftSoiMSH-Soi)]}. 

E(S,IZF1]-E[SII^=0]  EJSu-SoJ 

where  co,  =(Sn-S0i)/E[Sli  -  SJ.  This  is  a  weighted  average  arc  slope  of  f,(5)  on  the  interval  [S^,  SJ.  We  can 


"For  example,  if  fj(S)=P0+p15+T)i,  then  we  assume  {p^,  r^,  <J>1§  v;}  are  independent  of  Z,. 
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simplify  further  using  the  fact  that  fi(S,i)=fi(S0i)+fi'(S*)(Sli  -  Sa),  for  some  S*  in  the  interval  [Sq,,  SJ.38  Now 
we  can  write  the  Wald  estimator  as  an  average  derivative: 

EKCSuMKSa)]  EKS.i-SJf.XS*)] 

(43)  =         =     Eto.f.'CSl)] 

E[S1S  -  S^  E[SH-Sa] 

Given  the  monotonicity  assumption,  Uj  is  positive  for  everyone,  so  the  Wald  estimator  is  a  weighted  average 

of  individual-specific  slopes  at  a  point  in  the  interval  [Sa,  S,j].  The  weight  each  person  gets  is  proportional 

to  the  size  of  the  causal  effect  of  the  instrument  on  him  or  her.  The  range  of  variation  in  f^S)  summarized  by 

this  average  is  always  between  S^  and  SH. 

Angrist,  Imbens,  and  Graddy  (1995)  note  that  the  Wald  estimator  can  be  characterized  more  precisely 
in  a  number  of  important  special  cases.  First,  suppose  that  the  effect  of  the  instrument  is  the  same  for 
everybody,  i.e.,  <frn  is  constant.  Then  we  obtain  the  average  derivative  E[f/(S*)],  and  no  weighting  is  involved. 
If  fj(5)  is  linear  in  S,  as  in  Section  2.2. 1 ,  but  with  a  random  coefficient,  pi  then  the  Wald  estimator  is  a  weighted 
average  of  the  random  coefficient:  E[(Sn  -  SJpJ/  E[Sn  -  SJ.  If  $„  is  constant  and  fj(5)  is  linear,  then  the 
Wald  estimator  is  the  population  average  slope,  E[pJ. 

Another  interesting  special  case  is  when  fj(S)  is  a  quadratic  function  of  S,  as  in  Lang  (1993)  and  Card's 
(1995)  parameterization  of  a  structural  human-capital  earnings  function.  The  quadratic  function  captures  the 
notion  that  returns  to  schooling  decline  as  schooling  increases.  Note  that  for  a  quadratic  function,  the  point 
of  linearization  is  always  S*  =  (Su+So^.  The  Wald  estimator  is  therefore 

E[G>ifi'(SIi+S0i)/2)] 
i.e.,  a  weighted  average  of  individual  slopes  at  the  midpoint  of  the  interval  [S^  Su]  for  each  person.  The  fact 
that  the  weights  are  proportional  to  Su  -  Sa  sometimes  has  economic  significance.  In  the  Card  and  Lang 
models,  for  example,  the  first-stage  effect,  Su  -  Sa,  is  assumed  to  be  proportional  to  individual  discount  rates. 
Since  people  with  higher  discount  rates  get  less  schooling  and  the  schooling-earnings  relationship  has  been 


8Here  we  assume  that  ft(S)  is  continuously  differentiable  with  domain  equal  to  a  subset  of  the  real  line. 
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assumed  to  be  concave,  this  tends  to  make  the  Wald  estimate  higher  than  the  population  average  return.  Lang 
(1993)  called  this  phenomenon  "discount  rate  bias." 

In  some  applications,  it  is  interesting  to  characterize  the  range  of  variation  captured  by  the  Wald 
estimator  further.  Returning  to  (42),  which  describes  the  estimator  as  a  weighted  average  of  slopes  in  the 
interval  [S^,  Sn],  it  seems  natural  to  ask  which  values  S  are  most  likely  to  be  covered  by  this  interval.  For 
example,  does  [S0i,  SJ  usually  cover  12  years  of  education,  or  is  it  more  likely  to  cover  16  years?  The 
probability  S  6  [S^,  SJ  is  P^^S^So,].  Because  S;  is  discrete,  it  easier  to  work  with  PfS^S^Soj],  since  this 
can  be  expressed  as 

(44)  P[Sli>5^S0i]=  P[Sn>5]-  ?[Sa>S]  =  P[Si<5l  Z^O]-  P[S,<5I  Z~\]. 

This  is  the  difference  in  the  cumulative  distribution  function  (CDF)  of  schooling  with  the  instrument  switched 
off  and  on.  The  schooling  values  where  the  CDF-gap  is  largest  are  those  most  likely  to  be  covered  by  the 
interval  [S^,  Su],  and  therefore  most  often  represented  in  the  Wald/weighted  average. 

Angrist  and  Imbens  (1995)  used  equation  (44)  to  interpret  the  Wald  estimates  of  the  returns  to 
schooling  reported  by  Angrist  and  Krueger  (1991).39  They  report  a  Wald  estimate  based  on  first  quarter/fourth 
quarter  differences  in  log  weekly  wages  and  years  of  schooling  using  data  on  men  born  1930-39  in  the  1980 
Census.  Their  Wald  estimate  is  .089,  and  the  corresponding  OLS  estimate  is  .07.  The  first  quarter/fourth 
quarter  difference  in  CDFs  is  plotted  in  Figure  5.  The  difference  is  largest  in  the  8-14  years-of-schooling 
range.  This  is  not  surprising  since  compulsory  attendance  laws  mainly  affect  high  school  students,  i.e.,  those 
with  8-1 2  years  of  education.  The  CDF  gap  for  men  with  more  than  1 2  years  of  schooling  may  be  caused  by 
men  who  are  compelled  to  complete  high  school  and  but  then  attended  college  later. 

Finally,  we  note  that  the  discussion  of  IV  in  heterogeneous  and  nonlinear  models  so  far  has  ignored 
covariates.  2SLS  estimates  in  heterogeneous-outcomes  models  with  covariates  can  be  interpreted  in  much  the 
same  way  as  regression  estimates  of  models  with  covariates  were  interpreted  above.  That  is,  F/  estimates  in 
models  with  covariates  can  be  thought  of  as  producing  a  weighted  average  of  covariate-specific  Wald  estimates 


"See  also  Kling  (1998)  for  a  similar  analysis  of  instrumental  variables  estimates  using  distance  to  college  as  an 
instrument  for  schooling. 
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as  long  as  the  model  for  covariates  is  saturated  and  E[SJ  Xj,  ZJ  is  used  as  an  instrument.  In  other  cases  it 
seems  reasonable  to  assume  that  some  sort  of  approximate  weighted  average  is  being  generated,  but  we  are 
unaware  of  a  precise  causal  interpretation  that  fits  all  cases.''0 

2.4  Refutability 

Causality  can  never  be  proved  by  associations  in  non-experimental  data.  But  sometimes  the  lack  of 
association  between  variables  for  a  particular  group,  or  the  occurrence  of  an  association  between  the  "causing 
variable"  and  outcome  variable  for  a  group  thought  to  be  unaffected  by  the  treatment,  can  cast  doubt  on,  or 
even  refute,  a  causal  interpretation.  R.A.  Fisher  (quoted  in  Cochran,  1965)  argued  that  the  case  for  causality 
is  stronger  when  the  causal  model  has  many  implications  that  appear  to  hold.  For  this  reason,  he  suggested 
that  scientific  theories  be  made  "complicated,"  in  the  sense  that  they  yield  many  testable  implications. 

A  research  design  is  more  likely  to  be  successful  at  assessing  causality  if  possibilities  for  checking 
collateral  implications  of  causal  processes  are  "built  in."  At  one  level,  this  involves  estimating  less  restrictive 
models.  A  good  example  is  Freeman's  (1984)  panel  data  study  of  union  status,  which  looks  separately  at 
workers  who  join  unions  and  leave  unions.  If  unions  truly  raise  wages  of  their  members,  then  workers  who 
move  from  nonunion  to  union  jobs  should  experience  a  raise,  and  workers  who  move  from  union  to  nonunion 
jobs  should  experience  a  pay  cut.  Although  a  less  restrictive  model  may  yield  imprecise  estimates  or  be  subject 
to  different  biases  which  render  the  results  difficult  to  interpret  (e.g.,  different  unobserved  variables  may  cause 
workers  to  join  and  exit  union  jobs),  a  causal  story  is  strengthened  if  the  results  of  estimating  a  less  restrictive 
model  are  consistent  with  the  story. 

In  addition  to  these  considerations  of  robustness,  a  causal  model  will  often  yield  testable  predictions 
for  sub-populations  in  which  the  "treatment  effect"  should  not  be  observed,  either  because  the  sub-population 


40  A  recent  effort  in  this  direction  is  Abadie  (1998),  who  presents  conditions  under  which  2SLS  estimates  can  be 
interpreted  as  the  best  linear  predictor  for  an  underlying  causal  relationship.  He  also  introduces  a  new  IV  estimator 
that  always  has  this  property  for  models  with  a  single  binary  instrument. 
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is  thought  to  be  immune  to  the  treatment  or  did  not  receive  the  treatment.  Perhaps  the  best-  known  example 
of  this  type  of  analysis  is  Bound's  (1989)  study  of  the  effect  of  Disability  Insurance  (DI)  benefits  on  the  labor 
force  participation  rate  of  older  men.  Earlier  studies  (e.g.,  Parsons,  1980)  established  an  inverse  relationship 
between  the  participation  rate  and  the  DI  benefit-wage  replacement  ratio.  But  because  the  replacement  ratio 
is  a  decreasing  function  of  a  worker's  past  earnings,  Bound  argued  that  this  association  may  reflect  patterns 
of  labor  force  participation  rather  than  a  causal  response  to  DI  benefits.41 

To  test  the  causal  interpretation  of  earlier  work,  Bound  performed  two  types  of  analyses.  First,  he 
estimated  essentially  the  same  econometric  model  of  the  relationship  between  employment  and  potential  DI 
benefits  that  had  been  estimated  previously,  except  he  estimated  the  model  for  a  sub-sample  of  older  men  who 
had  never  applied  for  DI.  Because  one  would  not  expect  DI  benefits  to  provide  a  strong  work  disincentive  for 
this  sub-sample,  there  should  be  a  much  weaker  relationship,  or  no  relationship  at  all,  if  the  causal 
interpretation  of  DI  benefit  coefficients  is  correct.  Instead,  he  found  that  DI  benefits  had  about  the  same  effect 
in  this  sample  as  in  a  sample  that  included  men  who  actually  applied  for  and  received  DI  benefits,  suggesting 
that  a  causal  interpretation  of  the  effect  of  DI  benefits  was  not  warranted.  Second,  Bound  examined  the  labor 
force  behavior  of  men  who  applied  for  DI  but  were  turned  down.  He  reasoned  that  because  men  in  this  sub- 
sample  were  less  severely  disabled  than  men  who  received  DI,  the  labor  force  participation  rate  of  this  sub- 
sample  provided  a  "natural  'control'  group"  (p.  482)  for  predicting  the  upper  bound  of  the  labor  force 
participation  rate  of  DI  recipients  had  they  been  denied  DI  benefits.  Because  half  of  the  presumably  healthier 
rejected  DI  applicants  did  not  work  even  without  receiving  benefits,  Bound  concluded  that  most  DI  recipients 
did  not  work  because  they  were  disabled,  not  because  DI  benefits  induced  them  to  leave  the  labor  force. 

Notions  of  "refutability"  also  carry  over  to  IV  models.  In  Angrist  and  Krueger  (1991)  we  were 
concerned  that  quarter  of  birth,  which  was  the  instrument  for  schooling,  might  have  influenced  educational 
attainment  through  some  mechanism  other  than  the  interaction  of  school  start  age  and  compulsory  schooling 


4lWeIch  (1977)  provides  a  closely  related  criticism  of  work  on  Unemployment  Insurance  benefits. 
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laws.  To  test  this  threat  to  a  causal  interpretation  of  the  IV  estimates,  we  examined  whether  quarter  of  birth 
influenced  schooling  or  earnings  for  college  graduates,  who  presumably  were  unaffected  by  compulsory 
schooling  laws.  Although  quarter  of  birth  had  an  effect  on  these  outcomes  for  college  graduates,  the  effect  was 
weak  and  had  a  different  pattern  than  that  found  for  the  less-than-college  group,  suggesting  that  compulsory 
schooling  was  responsible  for  the  effects  of  quarter  of  birth  in  the  less-than-college  sample. 

Tests  of  refutability  may  have  flaws.  It  is  possible,  for  example,  that  a  subpopulation  that  is  believed 
unaffected  by  the  intervention  is  indirectly  affected  by  it.  For  example,  Parsons  (1991)  argues  that  rejected  DI 
applicants  are  a  misleading  control  group  because  they  may  exit  the  labor  force  to  strengthen  a  possible  appeal 
of  their  rejected  application  or  a  future  re-application  for  DI  benefits.42  Likewise,  some  students  who  complete 
high  school  because  of  compulsory  schooling  may  be  induced  to  go  on  to  college  as  a  result,  invalidating  our 
1991  test  of  refutability.  An  understanding  of  the  institutions  underlying  the  program  being  evaluated  is 
necessary  to  assess  tests  of  refutability,  as  well  as  to  identify  subpopulations  that  are  immune  from  .the 
intervention  according  to  the  causal  story  but  still  subject  to  possible  confounding  effects. 

Lastly,  there  has  been  much  recent  interest  in  evaluating  entire  research  designs,  as  in  Lalonde's 
(1986)  landmark  study  comparing  experimental  and  non-experimental  research  methods.  Only  rarely,  however, 
have  experiments  been  conducted  that  can  be  used  to  validate  non-experimental  research  strategies. 
Nonetheless,  non-experimental  research  designs  can  still  be  assessed  by  comparing  "pre-treatment"  trends  for 
the  treatment  and  comparison  group  (e.g.,  Ashenfelter  and  Card,  1985,  and  Heckman  and  Hotz,  1989)  or  by 
looking  for  effects  where  there  should  be  none  (e.g.,  Bound  ,  1989).  We  provide  another  illustration  of  this 
point  with  some  new  evidence  on  the  differences-in-differences  approach  used  in  Card's  (1990)  immigration 
study. 

In  the  summer  of  1 994,  tens  of  thousands  of  Cubans  boarded  boats  destined  for  Miami  in  an  attempt 
to  emigrate  to  the  United  States  in  a  second  Mariel  Boatlift  that  promised  to  be  almost  as  large  as  the  first  one, 


42Bound  (1989)  considered  and  rejected  these  threats  to  his  control  group.  Also  see  Bound's  (1991)  response  to 

Parsons  (1991). 
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which  occurred  in  the  summer  of  1980.  Wishing  to  avoid  the  political  fallout  that  accompanied  the  earlier 
boatlift,  the  Clinton  Administration  interceded  and  ordered  the  Navy  to  divert  the  would-be  immigrants  to  a 
base  in  Guantanamo  Bay.  Only  a  small  fraction  of  the  Cuban  emigres  ever  reached  the  shores  of  Miami. 
Hence,  we  call  this  event,  "The  Mariel  Boatlift  That  Didn't  Happen." 

Had  the  migrants  been  allowed  to  reach  the  United  States,  there  is  little  doubt  that  researchers  would 
have  used  this  "natural  experiment"  to  extend  Card's  (1990)  influential  study  of  the  earlier  influx  of  Cuban 
immigrants.  Nonetheless,  we  can  use  this  "non-event"  to  explore  Card's  research  design.  In  particular,  we 
can  ask  whether  Miami's  and  the  comparison  cities'  experiences  were  in  fact  similar  absent  the  large  wave  of 
immigrants  to  Miami.  Figure  1,  which  we  referred  to  earlier  in  the  discussion  of  Card's  paper,  shows  that 
nonagricultural  employment  growth  in  Miami  tracks  that  of  the  four  comparison  cities  rather  well  in  the  year 
before  and  few  years  after  the  summer  of  1994.  (A  vertical  bar  indicates  the  date  of  the  thwarted  boatlift.)  To 
provide  a  more  detailed  analysis  by  ethnic  group,  we  followed  Card  and  calculated  unemployment  rates  for 
Whites,  Blacks  and  Hispanics  in  Miami  and  the  four  comparison  cities  using  data  from  the  CPS  Outgoing 
Rotation  Groups.  These  results  are  reported  in  Table  7. 

The  Miami  unemployment  data  are  imprecise  and  variable,  but  still  indicate  a  large  increase  in 
unemployment  in  1994,  the  year  the  immigrants  were  diverted  to  Guantanamo  Bay.  On  the  other  hand,  1994 
was  the  first  year  the  CPS  redesign  was  implemented  (see  Section  3.1).  We  therefore  take  1993  as  the  "pre" 
period  and  1995  as  the  "post"  period  for  a  difference-in-difference  comparison.  For  Whites  and  Hispanics, 
the  unemployment  rate  fell  in  Miami  and  fell  even  more  in  the  comparison  cities  between  the  pre  and  post 
periods,  though  the  difference  between  these  two  changes  is  not  significant.  This  is  consistent  with  a  causal 
interpretation  of  Card's  (1990)  results,  which  attributes  the  difference-in-differences  to  the  effect  of 
immigration.  For  blacks,  however,  the  unemployment  rate  rose  by  3.6  percentage  points  in  Miami  between 
1993  and  1995,  while  it  fell  by  2.7  points  in  the  comparison  cities.  The  6.3  point  difference-in-differences 
estimate  is  on  the  margin  of  statistical  significance  (t=1.70),  and  would  have  made  it  look  like  the  immigrant 
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flow  had  a  negative  impact  on  Blacks  in  Miami  in  a  DD  study.  Since  there  was  no  immigration  shock  in  1994, 
this  illustrates  that  different  labor  market  trends  can  generate  spurious  findings  in  research  of  this  type. 

3.  Data  Collection  Strategies 

Table  1  documents  that  labor  economists  use  many  different  types  of  data  sets.  The  renewed  emphasis 
on  quasi-experiments  in  empirical  research  places  a  premium  on  finding  data  sets  for  a  particular  population 
and  time  period  containing  certain  key  variables.  Often  this  type  of  analysis  requires  large  samples,  because 
only  part  of  the  variation  in  the  variables  of  interest  is  used  in  the  estimation.  Familiarity  with  data  sets  is  as 
necessary  for  modem  labor  economics  as  is  familiarity  with  economic  theory  or  econometrics.  Knowledge  of 
the  populations  covered  by  the  main  surveys,  the  design  of  the  surveys,  the  response  rate,  the  variables 
collected,  the  size  of  the  samples,  the  frequency  of  the  surveys,  and  any  changes  in  the  surveys  over  time  is 
essential  for  successfully  implementing  an  empirical  strategy  and  for  evaluating  others'  empirical  research. 
This  section  provides  an  overview  of  the  most  commonly  used  data  sets  and  data  collection  strategies  in  labor 
economics. 

3.1  Secondary  Data  Sets 

The  most  commonly  used  secondary  data  sets  in  labor  economics  are  the  National  Longitudinal 
Surveys  (NLS),  the  Current  Population  Survey  (CPS),  the  Panel  Study  of  Income  Dynamics  (PSID),  and  the 
Decennial  Censuses.  Table  8  summarizes  several  features  of  the  main  secondary  data  sets  used  by  labor 
economists.  Below  we  provide  a  more  detailed  discussion  of  the  "big  three"  micro  data  sets  in  labor 
economics:  the  NLS,  CPS  and  PSID,  and  then  discuss  other  aspects  of  secondary  data  sets. 

Perhaps  because  of  its  easy-to-use  CD-ROM  format  and  the  breadth  of  its  questionnaire,  the  National 
Longitudinal  Surveys  are  popular  in  applied  work.  The  NLS  actually  consists  of  six  age-by-gender  data  sets: 
a  cohort  of  5,020  "older  men"  (age  45-59  in  1 966);  a  cohort  of  5,083  mature  women  (age  30-44  in  1967),  a 
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cohort  of  5,225  young  men  (age  14-24in  1966);  a  cohort  of  5, 159  young  women  (age  14-24in  1968)  in  1968); 
a  cohort  of  12,686  "youth"  known  as  the  NLSY  (age  14-22  in  1979);  and  a  cohort  of  7,035  children  of 
respondents  in  the  NLSY  (age  0-20  in  1986).42    Sampled  individuals  are  interviewed  annually.  All  but  the 
older  men  and  young  men  surveys  continue  today. 

The  CPS  is  an  ongoing  survey  of  more  than  50,000  households  that  is  conducted  each  month  by  the 
Census  Bureau  for  the  Bureau  of  Labor  Statistics  (BLS).43  Sampled  households  are  included  in  the  survey  for 
four  consecutive  months,  out  of  the  sample  for  8  months,  and  then  included  for  a  final  four  consecutive 
months.  Thus,  the  survey  has  a  "rotation  group"  design,  with  new  rotation  groups  joining  or  exiting  the  sample 
each  month.  The  resulting  data  are  used  by  the  Bureau  of  Labor  Statistics  to  calculate  the  unemployment  rate 
and  other  labor  force  statistics  each  month.  The  CPS  has  a  hierarchical  household-family-person  record 
structure  which  enables  household-level  and  family-level  analyses,  as  well  as  individual-level  analyses.  The 
design  of  the  CPS  has  been  copied  by  statistical  agencies  in  several  other  countries  and  there  used  to  calculate 
labor  force  statistics. 

In  the  U.S.,  regular  and  one-time  supplements  are  included  in  the  survey  to  collect  information  on 
worker  displacement,  contingent  work,  school  enrollment,  smoking,  voting,  and  other  important  behaviors. 
In  addition,  annual  income  data  from  several  sources  are  collected  each  month.  A  great  strength  of  the  CPS 
is  that  the  survey  began  in  the  1940s,  so  a  long  time-series  of  data  are  available;  on  the  other  hand,  there  have 
been  several  changes  that  affect  the  comparability  of  the  data  over  time,  and  micro  data  are  only  available  to 
researchers  for  years  since  1964.  In  addition,  because  of  its  rotation  group  design,  continuing  households  can 
be  linked  from  one  month  to  the  next,  or  between  years;  however,  individuals  who  move  out  of  sampled 
households  are  not  tracked,  and  it  is  possible  that  individuals  who  move  into  a  sampled  household  may  be 
miss-matched  to  other  individuals'  earlier  records.    High  attrition  rates  are  a  particular  problem  in  the  linked 


42See  NLS  Users' Guide  1995  for  further  information. 

43See  Polivka  (1996)  for  an  analysis  of  recent  changes  in  the  CPS,  and  for  a  list  of  supplements. 
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CPS  for  young  workers.  Unless  a  very  large  sample  size  is  required,  it  is  often  preferable  to  use  a  data  set  that 
was  designed  to  track  respondents  longitudinally,  instead  of  a  linked  CPS. 

The  PSID  is  a  national  probability  sample  that  originally  consisted  of  5,000  families  in  1968.44  The 
original  families,  and  new  households  that  have  grown  out  of  those  in  the  original  sample,  have  been  followed 
each  year  since.  Consequently,  the  PSID  provides  a  unique  data  set  for  studying  family-related  issues.  The 
number  of  individuals  covered  by  the  PSID  increased  from  18,000  in  1968  to  a  cumulative  total  exceeding 
40,000  in  1996,  and  the  number  of  families  increased  to  nearly  8,000.  Brown,  Duncan,  and  Stafford  (1996) 
note  that  the  "central  focus  of  the  data  is  economic  and  demographic,  with  substantial  detail  on  income  sources 
and  amounts,  employment,  family  composition  changes  and  residential  location."  The  PSID  is  also  one  of  the 
few  data  sets  that  contains  information  on  consumption  and  wealth.  A  recent  paper  by  Fitzgerald,  Gottschalk 
and  Moffit  (1998)  finds  that,  despite  attrition  of  nearly  half  the  sample  since  1968,  the  PSID  has  remained 
roughly  representative  through  1989.45 

The  accessibility  of  secondary  data  sets  is  changing  rapidly.  The  ICPSR  remains  a  major  collector  and 
distributor  of  data  sets  and  codebooks.  In  addition,  CPS  data  can  be  obtained  directly  from  the  Bureau  of 
Labor  Statistics.  Increasingly,  data  collection  agencies  are  making  their  data  directly  available  to  researchers 
via  the  internet.  In  1996,  for  example,  the  Census  Bureau  made  the  recent  March  Current  Population  Surveys, 
which  include  supplemental  information  on  annual  income  and  demographic  characteristics,  available  over 
the  internet.  Because  the  March  CPS  contains  annual  income  data,  many  researchers  have  matched  these  data 
from  one  year  to  the  next. 

Because  secondary  data  sets  are  typically  collected  for  a  broad  range  of  purposes  or  for  a  purpose  other 
than  that  intended  by  the  researcher,  they  often  lack  information  required  for  a  particular  project.  For  example, 
the  PSID  would  be  ideal  for  a  longitudinal  study  of  the  impact  of  personal  computers  on  pay,  except  it  lacks 


"This  paragraph  is  based  on  Brown,  Duncan,  and  Stafford  (1996). 

<5See  also  Becketti,  Gould,  Lillard  and  Welch  (1988)  for  evidence  on  the  representativeness  of  the  PSID. 
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information  on  the  use  of  personal  computers.  In  other  situations,  the  data  collector  may  omit  survey  items 
from  public-use  files  to  preserve  respondent  confidentiality.  Nonetheless,  several  large  public-use  surveys 
enable  researchers  to  add  questions,  or  will  provide  customized  extracts  with  variables  that  are  not  on  the 
public-use  file.  For  example,  Vroman  (1991)  added  supplemental  questions  to  the  CPS  on  the  utilization  of 
unemployment  insurance  benefits.  The  cost  of  adding  the  7  questions  was  $100,000.46  From  time  to  time, 
survey  organizations  also  solicit  researchers'  advice  on  new  questions  or  new  modules  to  add  to  on-going 
surveys.  Since  1993,  for  example,  the  PSED  has  held  an  open  competition  among  researchers  to  add 
supplemental  questions  to  the  PSID. 

3.1.1  Historical  Comparability  in  the  CPS  and  Census 

Statistical  agencies  are  often  faced  with  a  tradeoff  between  adjusting  questions  to  make  them  more 
relevant  for  the  modern  economy  and  maintaining  historical  comparability.  Often  it  seems  that  statistical 
agencies  place  insufficient  weight  on  historical  consistency.  For  example,  after  50  years  of  measuring 
education  by  the  highest  grade  of  school  individuals  attended  and  completed,  the  Census  Bureau  switched  to 
measuring  educational  attainment  by  the  highest  degree  attained  in  the  1990  Census.  The  CPS  followed  suit 
in  1992.  This  is  a  subtle  change  in  the  education  data,  but  it  is  important  for  labor  economists  to  be  aware  of, 
and  could  potentially  affect  estimates  of  the  economic  return  to  education  (see  Park,  1994  and  Jaeger,  1993). 
Because  many  statistics  are  most  informative  in  comparison  to  their  values  in  earlier  years,  it  is  important  that 
statistical  agencies  place  weight  on  historical  comparability  even  though  the  concepts  being  measured  may 
have  changed  over  time. 

Fortunately,  the  Bureau  of  Labor  Statistics  and  the  Census  Bureau  typically  introduce  a  major  change 
in  a  questionnaire  after  studying  the  likely  effects  of  the  change  on  the  survey  results.  Because  some  changes 


^Because  of  concern  that  the  additional  questions  might  affect  future  responses,  the  supplement  was  only  asked  of 
individuals  who  were  in  their  final  rotation  in  the  sample.  The  supplement  was  added  to  the  survey  in  the  months  of 
May,  August,  November  1989  and  February  1990.  The  sample  size  was  2,859  eligible  unemployed  individuals. 
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have  a  major  impact  on  certain  variables  (or  on  certain  populations),  it  is  important  that  analysts  be  aware  of 
changes  in  on-going  surveys,  and  of  their  likely  effects.    For  example,  a  major  redesign  of  the  CPS  was 
introduced  in  January  1994,  after  eight  years  of  study.  The  re-designed  CPS  illustrates  the  importance  of  being 
aware  of  questionnaire  changes,  as  well  as  the  difficulty  of  estimating  the  likely  impact  of  such  changes. 

The  redesigned  CPS  is  conducted  with  computer-assisted  interviewing  technology,  which  facilitates 
more  complicated  skip  patterns,  more  narrowly  tailored  questions,  and  dependent  interviewing  (in  which 
respondents'  answers  to  an  earlier  month's  question  are  integrated  into  the  current  month's  question).  In 
addition,  the  redesign  changed  the  way  key  labor  force  variables  were  collected.  Most  importantly,  individuals 
who  are  not  working  are  now  probed  more  thoroughly  for  activities  that  they  may  have  done  to  search  for 
work.  In  the  older  survey,  interviewers  were  instructed  to  ask  a  respondent  who  "appears  to  be  a  homemaker" 
whether  she  was  keeping  house  most  of  last  week  or  doing  something  else.  The  new  question  is  gender 
neutral.  Another  major  change  concerns  the  earnings  questions.  Prior  to  the  redesign,  the  CPS  asked 
respondents  for  their  usual  weekly  wage  and  usual  weekly  hours.47  The  ratio  of  these  two  variables  gives  the 
implied  hourly  wage.  The  redesigned  CPS  first  asks  respondents  for  the  easiest  way  they  could  report  their 
total  earnings  on  their  main  job  (e.g.,  hourly,  weekly,  annually,  or  on  some  other  basis),  and  then  collects  usual 
earnings  on  that  basis. 

To  gauge  the  impact  of  the  survey  redesign  on  responses  in  1992  and  1993,  the  BLS  and  Census 
Bureau  conducted  an  overlap  survey  in  which  a  separate  sample  of  households  was  given  the  redesigned  CPS, 
while  the  regular  sample  was  still  given  the  old  CPS  questionnaire.  Then,  for  the  first  five  months  of  1994, 
this  overlap  sample  was  given  the  old  CPS,  while  the  regular  sample  was  given  the  new  one.  Overlap  samples 
can  be  extremely  informative,  but  they  are  also  difficult  to  implement  properly.  In  this  instance,  the  overlap 
sample  was  drawn  with  different  procedures  than  the  regular  CPS  sample,  and  there  appear  to  be  systematic 
differences  between  the  two  samples  which  complicate  comparisons.  Taking  account  of  these  difficulties, 


47The  old  CPS  also  collected  hourly  earnings  for  workers  who  indicated  they  were  paid  hourly. 
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Polivka  (1996)  and  Polivka  and  Miller  (1995)  estimate  that  the  redesign  had  an  insignificant  effect  on  the 
unemployment  rate,  although  it  appears  to  have  raised  the  employment-to-population  ratio  of  women  by  1 .6 
percent,  raised  the  proportion  of  self-employed  women  by  20  percent,  increased  the  proportion  of  all  workers 
who  are  classified  as  part-time  by  10  percent,  and  decreased  the  fraction  of  discouraged  workers  (i.e.,  those 
out  of  the  labor  force  who  have  given  up  searching  for  work  because  they  believe  no  jobs  are  available  for 
them)  by  50  percent.  Polivka  (1997)  addresses  the  effect  of  the  redesign  on  the  derived  hourly  wage  rate.  She 
finds  that  the  redesign  causes  about  a  5  percent  increase  in  the  average  earnings  of  college  graduates  relative 
to  those  who  failed  to  complete  high  school,  and  about  a  2  percent  increase  in  the  male-female  gap.  If 
researchers  are  not  aware  of  the  potential  changes  in  measurement  brought  about  by  the  redesigned  CPS,  they 
could  spuriously  attribute  shifts  in  employment  or  wages  to  economic  forces  rather  than  to  changes  in  the 
questionnaire  and  survey  technology. 

Three  other  changes  in  the  CPS  are  especially  noteworthy.  First,  beginning  in  1980  the  Annual 
Demographic  Supplement  of  the  March  CPS  was  expanded  to  ask  a  more  probing  set  of  income  questions. 
The  impact  of  these  changes  can  be  estimated  because  the  1979  March  CPS  administered  the  old  (pre- 1980) 
questionnaire  to  five  of  the  eight  rotation  groups  in  the  sample,  and  administered  the  new,  more  detailed 
questionnaire  to  the  other  three  rotation  groups.48  Second,  as  noted  above,  the  education  question  (which  is 
on  the  "control  card"  rather  than  the  basic  monthly  questionnaire)  was  switched  from  the  number  of  years  of 
school  completed  to  the  highest  degree  attained  in  1992  (see  Park,  1994  and  Jaeger,  1993).  Third,  the  "top 
code"of  the  income  and  earnings  questions  —  that  is,  the  highest  level  of  income  allowed  to  be  reported  in  the 
public-use  file  ~  has  changed  over  time,  which  obviously  may  have  implications  for  studies  of  income 
inequality. 


48See  Krueger  (1990a)  for  an  analysis  of  the  change  in  the  questionnaire  on  responses  to  the  question  on  workers' 
compensation  benefits.  The  new  questionnaire  seems  to  have  detected  20  percent  more  workers'  compensation 
recipients.  See  Coder  and  Scoon-Rogers  (1996)  for  a  comparison  of  CPS  and  SIPP  income  measures. 
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3.2  Primary  data  collection  and  survey  methods 

It  is  becoming  increasingly  common  for  labor  economists  to  be  involved  in  collecting  their  own  data. 
Labor  economists'  involvement  in  the  design  and  collection  of  original  data  sets  takes  many  forms.  First,  it 
should  be  noted  that  labor  economists  have  long  played  a  major  role  in  the  design  and  collection  of  some  of 
the  major  public-use  data  files,  including  the  PSDD  and  NLS. 

Second,  researchers  have  turned  to  collecting  smaller,  customized  data  to  estimate  specific  quantities 
or  describe  certain  economic  phenomenon.  Some  of  Richard  Freeman's  research  illustrates  this  approach. 
Freeman  and  Hall  (1986)  conducted  a  survey  to  estimate  the  number  of  homeless  people  in  the  U.S.,  which 
came  very  close  to  the  official  Census  Bureau  estimate  in  1990.  Borjas,  Freeman  and  Lang  (1991)  conducted 
a  survey  of  border  crossing  behavior  of  illegal  aliens  to  estimate  the  number  of  illegal  aliens  in  the  U.S. 
Freeman  (1990)  conducted  a  survey  of  inner-city  youths  in  Boston,  which  in  part  is  a  follow-up  on  the  survey 
conducted  by  Freeman  and  Holzer  (1986).  Often,  data  collected  in  these  surveys  are  combined  with  secondary 
data  files  to  derive  national  estimates. 

Third,  some  surveys  have  been  conducted  to  probe  the  sensitivity  of  results  in  large-scale  secondary 
data  sets,  or  to  probe  the  sensitivity  of  responses  to  question  wording  or  order.  For  example,  Farber  and 
Krueger  (1993)  conducted  a  survey  of  102  households  in  which  non-union  respondents  were  asked  two 
different  questions  concerning  their  likelihood  of  joining  a  union,  with  the  order  of  the  questions  randomly 
interchanged.  The  two  questions,  which  are  listed  below,  were  included  in  earlier  surveys  conducted  by  the 
Canadian  Federation  of  Labor  (CFL)  and  the  American  Federation  of  Labor-Congress  of  Industrial 
Organizations  (AFL-CIO),  and  had  been  analyzed  by  Riddell  (1992).  Based  on  comparing  responses  to  these 
questions,  Riddell  concluded  that  American  workers  have  a  higher  "frustrated  demand"  for  unions. 


CFL  Q.:  Thinking  about  your  own  needs,  and  your  current  employment  situation  and 
expectations,  would  you  say  that  it  is  very  likely,  somewhat  likely,  not  very  likely,  or  not 
likely  at  all  that  you  would  consider  joining  or  associating  yourself  with  a  union  or  a 
professional  association  in  the  future? 
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AFL  Q.:  If  an  election  were  held  tomorrow  to  decide  whether  your  workplace  would  be 
unionized  or  not,  do  you  think  you  would  definitely  vote  for  a  union,  probably  vote  for  a 
union,  probably  vote  against  a  union,  or  definitely  vote  against  a  union? 


In  their  small-scale  survey,  Farber  and  Krueger  (1993)  found  that  the  responses  to  the  CFL  question  were 
extremely  sensitive  to  the  questions  that  preceded  them.  If  the  AFL  question  was  asked  first,  55%  of  nonunion 
members  answered  the  CFL  question  affirmatively,  but  if  the  CFL  question  was  asked  first,  26%  of  nonunion 
members  answered  affirmatively  to  the  CFL  question.49  Thus,  the  Farber  and  Krueger  results  suggest  a  good 
deal  of  caution  in  interpreting  the  CFL-style  question,  especially  across  countries. 

Finally,  and  of  most  interest  for  our  purposes,  researchers  have  conducted  special-purpose  surveys  to 
evaluate  certain  natural  experiments.  Probably  the  best  known  example  of  this  type  of  survey  is  Card  and 
Krueger's  (1994)  survey  of  fast  food  restaurants  in  New  Jersey  and  Pennsylvania.  Other  examples  include: 
Ashenfelter  and  Krueger's  (1994)  survey  of  twins;  Behrman,  Rosenzweig  and  Taubman's  (1996)  survey  of 
twins;  Mincer  and  Higuchi's(1988)  survey  of  turnover  at  Japanese  plants  in  the  U.S.  and  their  self-identified 
competitors;  and  Freeman  and  Kleiner's  (1990)  survey  of  companies  undergoing  a  union  drive  and  their 
competitors. 

Several  excellent  volumes  have  been  written  on  the  design  and  implementation  of  surveys,  and  a 
detailed  overview  of  this  material  is  beyond  the  scope  of  this  paper.50  But  a  few  points  that  may  be  of  special 
interest  to  labor  economists  are  outlined  below. 

Customized  surveys  seem  especially  appropriate  for  rare  populations,  which  are  likely  to  be  under- 
represented  or  not  easily  identified  in  public-use  data  sets.  Examples  include  identical  twins,  illegal  aliens, 
homeless  people,  and  disabled  people. 

To  conduct  a  survey,  one  must  obviously  have  a  questionnaire.  Preparing  a  questionnaire  can  be  a 
time-  consuming  and  difficult  endeavor.  Survey  researchers  often  find  that  answers  to  questions  --  even  factual 


<9The  t-ratio  for  the  difference  between  the  proportions  is  3.3. 

50See,  for  example.  Groves  (1989),  Sudman  and  Bradburn  (1991),  and  Singer  and  Presser  (1989). 
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economic  questions  --  are  sensitive  to  the  wording  and  ordering  of  questions.  Fortunately,  one  does  not  have 
to  begin  writing  a  questionnaire  from  scratch.  Survey  questionnaires  typically  are  not  copyright  protected. 
Because  many  economists  are  familiar  with  existing  questionnaires  used  in  the  major  secondary  data  sets  (e.g., 
the  CPS),  and  because  a  great  deal  of  effort  typically  goes  into  designing  and  testing  these  questionnaires,  it 
is  often  advisable  to  copy  as  many  questions  as  possible  verbatim  from  existing  questionnaires  when 
formulating  a  new  questionnaire.  Aside  from  the  credibility  gained  from  replicating  questions  from  well 
known  surveys,  another  advantage  of  duplicating  others'  questions  is  that  the  results  from  the  sampled 
population  can  be  compared  directly  to  the  population  as  a  whole  with  the  secondary  survey.  Furthermore, 
if  data  from  a  customized  survey  are  pooied  together  with  data  from  a  secondary  survey,  it  is  essential  that  the 
questions  be  comparable. 

One  promising  recent  development  in  questionnaire  design  involves  "follow-up  brackets"  (also  known 
as  "unfolding"  brackets).  This  technique  offers  bracketed  categories  to  respondents  who  initially  refuse  or 
are  unable  to  provide  an  exact  value  to  an  open  ended  question.  Juster  and  Smith  (1997)  find  that  follow-up 
brackets  reduced  nonresponse  to  wealth  questions  in  the  Health  and  Retirement  Survey  (HRS)  and  Asset  and 
Health  Dynamics  among  the  Oldest  Old  Survey  (AHEAD).  See  Hurd,  et  al.  (1998)  for  experimental  evidence 
of  anchoring  in  responses  based  on  the  sequence  of  unfolding  brackets  for  consumption  and  savings  data  in 
the  AHEAD  survey.  Follow-up  brackets  have  also  been  used  to  measure  wealth  in  the  PSID.  The  use  of 
follow-up  brackets  would  seem  particularly  useful  for  hard-to-measure  quantities,  such  as  income,  wealth, 
saving  and  consumption. 

Lastly,  power  calculations  should  guide  the  determination  of  sample  size  prior  to  the  start  of  a  survey. 
For  example,  suppose  the  goal  of  the  survey  is  to  estimate  a  95%  confidence  interval  for  a  mean.  With  random 
sampling,  the  expected  sample  size  (n)  required  to  obtain  a  confidence  interval  of  width  2W  is:  n  =  8o2/W2 , 
where  o2  is  the  population  variance  of  the  variable  in  question.  Although  the  variance  generally  will  not  be 
known  prior  to  conducting  the  survey,  an  estimate  from  other  surveys  can  be  used  for  the  power  calculation. 


67 
Also  notice  that  in  the  case  of  a  binary  variable  (i.e.,  if  the  goal  is  to  estimate  a  proportion,  p),  the  variance  is 
p(l-p),  so  in  the  worse-case  scenario  the  variance  is  .25  =  .5  *  .5.  It  should  also  be  noted  that  in  complex 
sample  designs  involving  clustering  and  stratification,  more  observations  are  usually  need  than  in  simple 
random  samples  to  attain  a  given  level  of  precision. 

3.3  Administrative  data  and  record  linkage 

Administrative  data,  i.e.,  data  produced  as  a  by-product  of  some  administrative  function,  often  provide 
inexpensive  large  samples.  The  proliferation  of  computerized  record  keeping  in  the  last  decade  should  increase 
the  number  of  administrative  data  sets  available  in  the  future.  Examples  of  widely  used  administrative  data 
bases  include  social  security  earnings  records  (Ashenfelter  and  Card,  1985,  Vroman,  1990,  Angrist,  1990), 
unemployment  insurance  payroll  and  benefit  records  (Anderson,  1993,  Katz  and  Meyer,  1990,  Jacobson, 
Lalonde,  and  Sullivan,  1994,  Card  and  Krueger,  1998),  workers' compensation  insurance  records  (Meyer, 
Viscusi  and  Durbin,  1995,  and  Krueger,  1990b),  company  personnel  records  (Medoff  and  Abraham,  1980, 
Lazear,  1992,  Baker,  Gibbs  and  Holmstrom,  1994),  and  college  records  (Bowen  and  Bok,  1998).  An 
advantage  of  administrative  data  is  that  they  often  contain  enormous  samples  or  even  an  entire  population. 
Another  advantage  is  that  administrative  data  often  contain  the  actual  information  used  to  make  economic 
decisions.  Thus,  administrative  data  may  be  particularly  useful  for  identifying  causal  effects  from  discrete 
thresholds  in  administrative  decision  making,  or  for  implementing  strategies  that  control  for  selection  on 
observed  characteristics. 

A  frequent  limitation  of  administrative  data,  however,  is  that  they  may  not  provide  a  representative 
sample  of  the  relevant  population.  For  example,  companies  that  are  willing  to  make  their  personnel  records 
available  are  probably  not  representative  of  all  companies.  In  some  cases  administrative  data  have  even  been 
obtained  as  a  by-product  of  court  cases  or  collected  by  parties  with  a  vested  interest  in  the  outcome  of  the 
research,  in  which  case  there  is  additional  reason  to  be  concerned  about  the  representativeness  of  the  samples. 
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Another  common  limitation  of  administrative  data  is  that  they  are  not  generated  with  research  purposes 
in  mind,  so  they  may  lack  key  variables  used  in  economic  analyses.  For  example,  social  security  earnings 
records  lack  data  on  individuals' education.  As  a  consequence,  it  is  common  for  researchers  to  link  survey  data 
to  administrative  data,  or  to  link  across  administrative  data  sets.  Often  these  links  are  based  on  social  security 
numbers  or  the  individuals'  names.  Examples  of  linked  data  sets  include:  the  Continuous  Longitudinal 
Manpower  Survey  (CLMS)  survey,  which  is  a  link  between  social  security  records  and  the  1976  CPS;  the  1973 
Exact  Match  file  which  contains  CPS,  IRS,  and  social  security  data;  and  the  Longitudinal  Employer-Employee 
Data  Set  (LEEDS).  All  of  these  linked  data  sets  are  now  dated,  but  they  can  still  be  used  for  some  important 
historical  studies  (e.g.,  Chay,  1996).  More  recently,  the  Census  Bureau  has  been  engaged  in  a  project  to  link 
Census  daa  to  the  Survey  of  Manufacturers. 

It  is  also  possible  to  petition  government  agencies  to  release  administrative  data.  Although  the  Internal 
Revenue  Service  severely  limits  disclosure  of  federal  administrative  records  collected  for  tax  purposes,  State 
data  is  often  accessible  and  even  federal  data  can  still  be  linked  and  released  under  some  circumstances.  For 
example,  Angrist  (1998)  linked  military  personnel  records  to  Social  Security  Administration  (SSA)  data.  The 
HRS  has  also  successfully  linked  SSA  data  to  survey-based  data.  Furthermore,  many  states  provide  fairly  free 
access  to  UI  payroll  tax  data  to  researchers  for  the  purpose  of  linking  data.51  There  is  also  a  literature  on  data 
release  schemes  for  administrative  records  that  preserve  confidentiality  and  meet  legal  requirements  (see,  e.g., 
Duncan  and  Pearson,  1991). 

3.4  Combining  samples 

Although  in  some  cases  individual  records  can  be  linked  between  different  data  sources,  an  alternative 
linkage  strategy  exploits  the  fact  that  many  of  the  estimators  used  in  empirical  research  can  be  constructed  from 
separate  sets  of  first  and  second  moments.  So,  in  principle,  individual  records  with  a  full  complement  of 


3lAn  example  is  Krueger  and  Kruse  (1996),  which  links  New  Jersey  unemployment  insurance  payroll  tax  data  to  a 
data  set  the  authors  collected  in  a  survey  of  disabled  individuals. 
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variables  are  not  always  needed  to  carry  out  a  multivariate  analysis.  It  is  sometimes  enough  to  have  all  the 
moments  required,  even  though  these  moments  may  be  drawn  from  more  than  one  sample.  In  practice,  this 
makes  it  possible  to  undertake  empirical  projects  even  if  the  required  data  are  not  available  in  any  single 
source. 

Recent  versions  of  the  multiple-sample  approach  to  empirical  work  include  the  two-sample 
instrumental  variables  estimators  developed  by  Arellano  and  Meghir  (1992)  and  Angrist  and  Krueger  (1992, 
1995),  and  used  by  Lusardi  (1996),  Japelli,  Pischke,  and  Souleles  (1998),  and  Kling  (1998).  The  use  of  two 
samples  to  estimate  regression  coefficients  dates  back  at  least  to  Durbin  (1953),  who  discussed  the  problem 
of  how  to  update  OLS  estimates  with  information  from  a  new  sample.  Maddala  (1971)  discussed  a  similar 
problem  using  a  maximum  likelihood  framework.  This  idea  was  recently  revived  by  Imbens  and  Lancaster 
(1994),  who  address  the  problem  of  how  to  use  macroeconomic  data  in  micro-econometric  models.  Deaton 
(1985)  focuses  on  estimating  panel  data  models  with  aggregate  data  on  cohorts. 

4.  Measurement  Issues 

In  his  classic  volume  on  the  accuracy  of  economic  measurement,  Oskar  Morgenstem  (1950)  quotes 
the  famed  mathematician  Norbert  Wiener  as  remarking,  "Economics  is  a  one  or  two  digit  science."  The  fact 
that  the  focus  of  most  empirical  research  has  moved  from  aggregate  time-series  data  to  micro-level  cross- 
sectional  and  longitudinal  survey  data  in  recent  years  only  magnifies  the  importance  of  measurement  error, 
because  (random)  errors  tend  to  average  out  in  aggregate  data.  Consequently,  a  good  deal  of  attention  has 
been  paid  to  the  extent  and  impact  of  "noisy"  data  in  the  last  decade,  and  much  has  been  learned. 

Measurement  error  can  arise  for  several  reasons.  In  survey  data,  a  common  source  of  measurement 
error  is  that  respondents  give  faulty  answers  to  the  questions  posed  to  them.52  For  example,  some  respondents 


"Even  well-trained  economists  can  make  errors  of  this  sort.  Harvard's  Dean  of  Faculty  Henry  Rosovsky  (1991,  p. 
40)  gives  the  following  account  of  a  meeting  he  had  with  an  enraged  economics  professor  who  complained  about  his 
salary:  "After  a  quick  calculation,  this  quantitatively  oriented  economist  concluded  that  his  raise  was  all  of  1  percent: 
an  insult  and  an  outrage.  I  had  the  malicious  pleasure  of  correcting  his  mistaken  calculation.  The  raise  was  6 
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may  intentionally  exaggerate  their  income  or  educational  attainment  to  impress  the  interviewer,  or  they  may 
shield  some  of  their  income  from  the  interviewer  because  they  are  concerned  the  data  may  somehow  fall  into 
the  hands  of  the  IRS,  or  they  may  unintentionally  forget  to  report  some  income,  or  they  may  misinterpret  the 
question,  and  so  on.  Even  in  surveys  like  SE?P,  which  is  specifically  designed  to  measure  participation  in 
public  programs  like  UI  and  AFDC,  respondents  appear  to  under-report  program  participation  by  20  to  40 
percent  (see  Marquis,  Moore  and  Bogen,  1996).  It  should  also  be  stressed  that  in  many  situations,  even  if  all 
respondents  correctly  answer  the  interviewers'  questions,  the  observed  data  need  not  correspond  to  the  concept 
that  researchers  would  like  to  measure.  For  example,  in  principle,  human  capital  should  be  measured  by 
individuals'  acquired  knowledge  or  skills;  in  practice  it  is  measured  by  years  of  schooling.53 

For  these  reasons,  it  is  probably  best  to  think  of  data  as  being  routinely  mismeasured.  Although  few 
economists  consider  measurement  error  the  most  exciting  research  topic  in  economics,  it  can  be  of  much 
greater  practical  significance  than  several  hot  issues.  Topel  (1991),  for  example,  provides  evidence  that 
failure  to  correct  for  measurement  error  greatly  affects  the  estimated  return  to  job  tenure  in  panel  data  models. 
Fortunately,  the  direction  of  biases  caused  by  measurement  error  can  often  be  predicted.  Moreover,  in  many 
situations  the  extent  of  measurement  error  can  be  estimated,  and  the  parameters  of  interest  can  be  corrected 
for  biases  caused  by  measurement  error. 

4.1  Measurement  Error  Models 
4.1.1  The  Classical  Model 

Suppose  we  have  data  on  variables  denoted  X(  and  Yj  for  a  sample  of  individuals.  For  example,  X, 
could  be  years  of  schooling  and  Y;  log  earnings.  The  variables  X;  and  Y;  may  or  may  not  equal  the  correctly- 
measured  variables  the  researcher  would  like  to  have  data  on,  which  we  denote  X(*  and  Yj*.  The  error  in 


percent:  he  did  not  know  his  own  salary  and  had  used  the  wrong  base." 

"Measurement  error  arising  from  the  mismatch  between  theory  and  practice  also  occurs  in  administrative  data.  In 
fact,  this  may  be  a  more  severe  problem  in  administrative  data  than  in  survey  data. 
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measuring  the  variables  is  simply  the  deviation  between  the  observed  variable  and  the  correctly-measured 
variable:  for  example,  es  =  Xj-X,*,  where  e,  is  the  measurement  error  in  X,.  Considerations  of  measurement 
error  usually  start  with  the  assumption  of  "classical"  measurement  errors. !4  Under  the  classical  assumptions, 
ej  is  assumed  to  have  the  properties  C(ei,Xi*)=E(ei)=0.  That  is,  the  measurement  error  is  just  mean-zero 
"white  noise".  Classical  measurement  error  is  not  a  necessary  feature  of  measurement  error;  rather,  these 
assumptions  are  best  viewed  as  a  convenient  starting  point. 

What  are  the  implications  of  classical  measurement  error?  First,  consider  a  situation  in  which  the 
dependent  variable  is  measured  with  error.  Specifically,  suppose  that  Yj  =  Y,*  +  up  where  Y;  is  the  observed 
dependent  variable,  Yj*  is  the  correctly-measured,  desired,  or  "true"  value  of  the  dependent  variable,  and  uf 
is  classical  measurement  error.  If  Y,  is  regressed  on  one  or  more  correctly-measured  explanatory  variables, 
the  expected  value  of  the  coefficient  estimates  is  not  affected  by  the  presence  of  the  measurement  error. 
Classical  measurement  error  in  the  dependent  variable  leads  to  less  precise  estimates  --  because  the  errors  will 
inflate  the  standard  error  of  the  regression  -  but  does  not  bias  the  coefficient  estimates.55 

Now  consider  the  more  interesting  case  of  measurement  error  in  an  explanatory  variable.    For 
simplicity,  we  focus  on  a  bivariate  regression,  with  mean  zero  variables  so  we  can  suppress  the  intercept. 
Suppose  Y*  is  regressed  on  the  observed  variable  Xj,  instead  of  on  the  correctly-measured  variable  X;*.  The 
population  regression  of  Y(*  on  Xs*  is: 

(45)  Yi*  =  X,*6  +  ei, 

while  if  we  make  the  additional  assumption  that  the  measurement  error  (e;)  and  the  equation  error  (ej  are 
uncorrected,  the  population  regression  of  Ys*  on  Xj  is: 

(46)  Y,*  =  Xj  XQ,  +  £i 


"References  for  the  effect  of  measurement  error  include  Duncan  and  Hill  (1985),  Griliches  (1986),  Fuller  (1987), 
and  Bound  and  Krueger  (1991). 

"If  the  measurement  error  in  the  dependent  variable  is  not  classical,  then  the  regression  coefficients  will  be  biased. 
The  bias  will  equal  the  coefficients  from  a  hypothetical  regression  of  the  measurement  error  on  the  explanatory 
variables. 
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where  X  =  C(X*,X)  /  V(X).  If  X;  is  measured  with  classical  measurement  error,  then  C(X*,X)  =  V(X*)  and 
V(X)  =  V(X*)  +  V(e),  so  the  regression  coefficient  is  necessarily  attenuated,  with  the  proportional  "attenuation 
bias"  equal  to  (\-X)  <  l.56  The  quantity  X  is  often  called  the  "reliability  ratio".  If  data  on  both  X*  and  Xj  were 
available,  the  reliability  ratio  could  be  estimated  from  a  regression  of  Xj*  on  Xj.  A  higher  reliability  ratio 
implies  that  the  observed  variability  in  X;  contains  less  noise. 

Although  classical  measurement  error  models  provide  a  convenient  starting  place,  in  some  important 
situations  classical  measurement  error  is  impossible.  If  Xj  is  a  binary  variable,  for  example,  then  it  must  be 
the  case  that  measurement  errors  in  Xj  are  dependent  on  the  values  of  Xj*.  This  is  because  a  dummy  variable 
can  only  be  misclassified  in  one  of  two  ways  (a  true  1  can  be  classified  as  a  0,  and  a  true  0  can  be  classified 
as  a  1),  so  only  two  values  of  the  error  are  possible  and  the  error  automatically  depends  on  the  true  value  of 
the  variable.  An  analogous  situation  arises  with  variables  whose  range  is  limited.  Aigner  (1972)  shows  that 
random  misclassification  of  a  binary  variable  still  biases  a  bivariate  regression  coefficient  toward  0  even 
though  the  resulting  measurement  error  is  not  classical.  But,  in  general,  if  measurement  error  in  Xj  is  not 
classical,  the  bias  factor  could  be  greater  than  or  less  than  one,  depending  on  the  correlation  between  the 
measurement  error  and  the  true  variable.  Note,  however,  that  regardless  of  whether  or  not  the  classical 
measurement  error  assumptions  are  met,  the  proportional  bias  (\-X)  is  still  given  by  one  minus  the  regression 
coefficient  from  a  regression  of  Xj*  on  Xj.57 

Another  important  special  case  of  non-classical  measurement  error  occurs  when  a  group  average  is 
used  as  a  "proxy-variable"  for  an  individual-level  variable  in  micro  data.  For  example,  average  wages  in  an 


^Notice  these  are  descriptions  of  population  regressions.  The  estimated  regression  coefficient  is  asymptotically 
biased  by  a  factor  ( 1  -X),  though  the  bias  may  differ  in  a  finite  sample.    If  the  conditional  expectation  of  Y  is  linear  in 
X,  such  as  in  the  case  of  normal  errors,  the  expected  value  of  the  bias  is  (1-X)  in  a  finite  sample. 

"This  result  requires  the  previously  mentioned  assumption  that  ct  and  €j  be  uncorrected.  It  may  also  be  the  case  that 
the  measurement  error  is  not  mean  zero.  Statistical  agencies  often  refer  to  such  phenomenon  as  "non-sampling 
error"  (see,  e.g.,  McCarthy,  1979).  Such  non-sampling  errors  may  arise  if  the  questionnaire  used  to  solicit 
information  does  not  pertain  to  the  economic  concept  of  interest,  or  if  respondents  systematically  under  or  over 
report  their  answers  even  if  the  questions  do  accurately  reflect  the  relevant  economic  concepts.  An  important 
implication  of  non-sampling  error  is  that  aggregate  totals  will  be  biased. 
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industry  or  county  might  be  substituted  for  individual  wage  rates  on  the  right-hand  side  of  an  equation  if  micro 
wage  data  are  missing.  Although  this  leads  to  measurement  error,  since  the  proxy-variable  replaces  a  desired 
regressor,  asymptotically  there  is  no  measurement-error  bias  in  a  bivariate  regression  in  this  case.  One  way 
to  see  this  is  to  note  that  the  coefficient  from  a  regression  of,  say,  X>  on  E[X,I  industry  j]  has  a  probability  limit 
of  1. 

So  far  the  discussion  has  considered  the  case  of  a  bivariate  regression  with  just  one  explanatory 
variable.  As  noted  in  Section  2,  adding  additional  regressors  will  typically  exacerbate  the  impact  of 
measurement  error  on  the  coefficient  of  the  mismeasured  variable  because  the  inclusion  of  additional 
independent  variables  absorbs  some  of  the  signal  in  X;,  and  thereby  reduces  the  residual  signal  to  noise  ratio. 
Assuming  that  the  other  explanatory  variables  are  measured  without  error,  the  reliability  ratio  conditional  on 
other  explanatory  variables  becomes  X'  =  (k-  R2)/(l-R2)  where  R2  is  the  coefficient  of  determination  from 
a  regression  of  the  mismeasured  X,  on  the  other  explanatory  variables.  If  the  measurement  error  is  classical, 
then  k'<,X.  And  even  if  the  measurement  error  is  not  classical,  it  still  remains  true  that  when  there  are 
covariates  in  equation  (45),  the  proportional  bias  is  given  by  the  coefficient  on  X>  in  a  regression  of  Xj*  on  X( 
and  the  covariates.  Note,  however,  that  in  models  with  covariates  it  no  longer  need  be  the  case  that  the  use 
of  aggregate  proxy  variables  generates  no  asymptotic  bias. 

An  additional  feature  of  measurement  error  important  for  applied  work  is  that,  for  reasons  similar  to 
those  raised  in  the  discussion  of  models  with  covariates,  attenuation  bias  due  to  classical  measurement  error 
is  generally  exacerbated  in  panel  data  models.  In  particular,  if  the  independent  variable  is  expressed  in  first 
differences  and  if  we  assume  that  Xf*  and  e;  are  covariance  stationary,  the  reliability  ratio  is: 
(47)  X  =  V(Xj*)  /  {V(X(*)  +  Vfe)  [(l-T)/(l-r)]  }, 

where  r  is  the  coefficient  of  first-order  serial  correlation  in  X*  and  t  is  the  first-order  serial  correlation  in  the 
measurement  error.  If  the  (positive)  serial  correlation  in  X;*  exceeds  the  (positive)  serial  correlation  in  the 
measurement  error,  attenuation  bias  is  greater  in  first-differenced  data  than  in  cross-sectional  data  (Griliches 
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and  Hausman,  1986).  Classical  measurement  errors  are  usually  assumed  to  be  serially  uncorrelated  (t=0),  in 
which  case  the  attenuation  bias  is  greater  in  a  first-differenced  regression  than  in  a  levels  regression.  The 
intuition  for  this  is  that  some  of  the  signal  in  Xs  cancels  out  in  the  first-difference  regression  because  of  serial 
correlation  in  X,*,  while  the  effect  of  independent  measurement  errors  is  amplified  because  errors  can  occur 
in  the  first  or  second  period.  A  similar  situation  arises  if  differences  are  taken  over  dimensions  of  the  data 
other  than  time,  such  as  between  twins  or  siblings. 

Finally,  note  that  if  an  explanatory  variable  is  a  function  of  a  mismeasured  dependent  variable,  the 
measurement  errors  in  the  dependent  and  independent  variables  are  automatically  correlated.  Borjas  (1980) 
notes  that  this  situation  often  arises  in  labor  supply  equations  where  the  dependent  variable  is  hours  worked 
and  the  independent  variable  is  average  hourly  earnings,  derived  by  dividing  weekly  or  annual  earnings  by 
hours  worked.  In  this  situation,  measurement  error  in  Y;  will  induce  a  negative  bias  when  (Yj*  +  u()  is 
regressed  on  X;  */(Yj*+Uj).  In  other  situations,  both  the  dependent  and  independent  variables  may  have  the 
same  noisy  measure  in  the  denominator,  such  as  when  the  variables  are  scaled  to  be  per  capita  (common  in  the 
economic  growth  literature).  If  the  true  regression  parameter  were  0,  this  would  bias  the  estimated  coefficient 
toward  1.  The  extent  of  bias  in  these  situations  is  naturally  related  to  the  extent  of  the  measurement  error  in 
the  variable  that  appears  on  both  the  right-hand  and  left-hand  side  of  the  equation. 

4.1.2  Instrumental  Variables  and  Measurement  Error 

One  of  the  earliest  uses  of  IV  was  as  a  technique  to  overcome  errors-in-variables  problems.  For 
example,  in  his  classic  work  on  the  permanent  income  hypothesis,  Friedman  (1957)  argued  that  annual  income 
is  a  noisy  measure  of  permanent  income.  The  grouped  estimator  he  used  to  overcome  measurement  errors  in 
permanent  income  can  be  thought  of  as  IV.  It  is  now  well  known  that  IV  yields  consistent  parameter  estimates 
even  if  the  endogenous  regressor  is  measured  with  classical  error,  assuming  that  a  valid  instrument  exists. 
Indeed,  one  explanation  why  IV  estimates  of  the  return  to  schooling  frequently  exceed  OLS  estimates  is  that 
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measurement  error  attenuates  the  OLS  estimates  (e.g.,  Griliches,  1977). 

In  a  recent  paper,  Kane,  Rouse  and  Staiger  (1997)  emphasize  that  IV  can  yield  inconsistent  parameter 
estimates  if  the  endogenous  regressor  is  measured  with  non-classical  measurement  error.58  Specifically,  they 
show  that  if  the  mismeasured  endogenous  regressor,  Xj,  is  a  dummy  variable,  the  measurement  error  will  be 
correlated  with  the  instrument,  and  typically  bias  the  magnitude  of  IV  coefficients  upward.59  The  probability 
limit  of  the  IV  estimate  in  this  case  is: 


B 

(48) 


1  -  P(Xj=0IXj*=l)  -  P(X,=llXi*=0) 
Intuitively,  the  parameter  of  interest  is  inflated  by  one  minus  the  sum  of  the  probabilities  of  the  two  types  of 
errors  that  can  be  made  in  measuring  X;  (observations  that  are  l's  can  be  classified  as  O's,  and  observations  that 
are  O's  can  be  classified  as  l's).  The  reason  IV  tends  to  overestimate  the  parameter  of  interest  is  that  if  Xj  is 
a  binary  variable,  the  value  of  the  measurement  error  is  automatically  dependent  on  the  true  value  of  Xj*,  and 
therefore  must  be  correlated  with  the  instrumental  variable  because  the  instrumental  variable  is  correlated  with 
Xj*.  Combining  this  result  with  the  earlier  discussion  of  attenuation  bias,  it  should  be  clear  that  if  the  regressor 
is  a  binary  variable  (in  a  bivariate  regression),  the  probability  limit  of  the  OLS  and  IV  estimators  bound  the 
coefficient  of  interest,  assuming  the  specifications  are  otherwise  appropriate.  In  the  more  general  case  of 
nonclassical  measurement  error  in  a  continuous  explanatory  variable,  IV  estimates  can  be  attenuated  or 
inflated,  as  in  the  case  of  OLS. 

4.2  The  Extent  of  Measurement  Error  in  Labor  Data 

Mellow  and  Sider  (1 983)  provide  one  of  the  first  systematic  studies  of  the  properties  of  measurement 


58A  similar  point  has  been  made  by  James  Heckman  in  an  unpublished  comment  on  Ashenfelter  and  Krueger  (1994). 
59The  exception  is  if  X(  is  so  poorly  measured  that  it  is  negatively  correlated  with  Xj*. 


76 
error  in  survey  data.  They  examined  two  sources  of  data:  (1)  employee-reported  data  from  the  January  1977 
CPS  linked  to  employer-reported  data  on  the  same  variables  for  sampled  employees;  (2)  an  exact  match 
between  employees  and  employers  in  the  1980  Employment  Opportunity  Pilot  Project  (EOPP).  Mellow  and 
Sider  focus  on  the  extent  of  agreement  between  employer  and  employee  reported  data,  rather  than  the 
reliability  of  the  CPS  data  per  se.  For  example,  they  find  that  92.3%  of  employers  and  employees  reported  the 
same  one-digit  industry,  while  at  the  three-digit-industry  level,  the  rate  of  agreement  fell  to  71.1%.  For  wages, 
they  find  that  the  employer-reported  data  exceeded  the  employee-reported  data  by  about  5%.  The  mean  union 
rate  was  slightly  higher  in  the  employer-reported  data  than  in  the  employee-reported  data.  They  also  found 
that  estimates  of  micro-level  human  capital  regressions  yielded  qualitatively  similar  results  whether  employee- 
reported  or  employer-reported  data  are  used.  This  similarity  could  result  from  the  occurrence  of  roughly  equal 
amounts  of  noise  in  the  employer  and  employee  reported  data. 

Several  other  studies  have  estimated  reliability  ratios  for  key  variables  of  interest  to  labor  economists. 
Two  approaches  to  estimating  reliability  ratios  have  typically  been  used.  First,  if  the  researcher  is  willing  to 
call  one  source  of  data  the  truth,  then  X  can  be  estimated  directly  as  the  ratio  of  the  variances:  V(Xi*)A'(Xl). 
Second,  if  two  measures  of  the  same  variable  are  available  (denoted  X,;  and  X2i),  and  if  the  errors  in  these 
variables  are  uncorrelated  with  each  other  and  uncorrelated  with  the  true  value,  then  the  covariance  between 
X,;  and  X2i  provides  an  estimate  of  V(Xj*).  The  reliability  ratio  X  can  then  be  estimated  by  using  the  variance 
of  either  measure  as  the  denominator  or  by  using  the  geometric  average  of  the  two  variances  as  the 
denominator.  The  former  can  be  calculated  as  the  slope  coefficient  from  a  regression  of  one  measure  on  the 
other,  and  the  latter  can  be  calculated  as  the  correlation  coefficient  between  the  two  measures.  If  a  regression 
approach  is  used,  the  variable  that  corresponds  most  closely  to  the  data  source  that  is  usually  used  in  analysis 
should  be  the  explanatory  variable  (because  the  two  sources  may  have  different  error  variances). 

An  example  of  two  mismeasured  reports  on  a  single  variable  are  respondents' reports  of  their  parents' 
education  in  Ashenfelter  and  Krueger's  (1994)  twins  study.  Each  adult  twin  was  asked  to  report  the  highest 
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grade  of  education  attained  by  his  or  her  mother  and  father.  Because  each  member  of  a  pair  of  twins  has  the 
same  parents,  the  responses  should  be  the  same,  and  there  is  no  reason  to  prefer  one  twin's  response  over  the 
other's.  Differences  between  the  two  responses  for  the  same  pair  of  twins  represent  measurement  error  on  the 
part  of  at  least  one  twin.  The  correlation  between  the  twins'  reports  of  their  father's  education  is  .86,  and  the 
correlation  between  reports  of  their  mother's  education  is  .84.  These  figures  probably  overestimate  the 
reliability  of  the  parental  education  data  because  the  reporting  errors  are  likely  to  be  positively  correlated;  if 
a  parent  mis-represented  his  education  to  one  twin,  he  is  likely  to  have  similarly  mis-represented  his  education 
to  the  other  twin  as  well. 

Table  9  summarizes  selected  estimates  of  the  reliability  ratio  for  self-reported  log  earnings,  hours 
worked,  and  years  of  schooling,  three  of  the  most  commonly  studied  variables  in  labor  economics.  These 
estimates  provide  an  indication  of  the  extent  of  attenuation  bias  when  these  variables  appear  as  explanatory 
variables.  All  of  the  estimates  of  the  reliability  of  earnings  data  in  the  table  are  derived  by  comparing 
employees'  reported  earnings  data  with  their  employers'  personnel  records  or  tax  reports.  The  estimates  from 
the  PSID  validation  study  are  based  on  data  from  a  single  plant,  which  probably  reduces  the  variance  of 
correctly-measured  variables  compared  to  their  variance  in  the  population.  This  in  turn  reduces  the  estimated 
reliability  ratio  if  reporting  errors  have  the  same  distribution  in  the  plant  as  in  the  population. 

Estimates  of  X  for  cross-sectional  earnings  range  from  .70  to  .80  for  men;  X  is  somewhat  higher  for 
women.  The  estimated  reliability  falls  to  about  0.60  when  the  earnings  data  are  expressed  as  year-to-year 
changes.  The  decline  in  the  reliability  of  the  earnings  data  is  not  as  great  if  four-year  changes  are  used  instead 
of  annual  changes,  reflecting  the  fact  that  there  is  greater  variance  in  the  signal  in  earnings  over  longer  time 
periods.  Interestingly,  the  PSID  validation  study  also  suggests  that  hours  data  are  considerably  less  reliable 
than  earnings  data. 

The  reliability  of  self-reported  education  has  been  estimated  by  comparing  the  same  individual's 
reports  of  his  own  education  at  different  points  in  time,  or  by  comparing  different  siblings'  reports  of  the  same 
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individual's  education.  The  estimates  of  the  reliability  of  education  are  in  the  neighborhood  of  .90.  Because 
education  is  often  an  explanatory  variable  of  interest  in  a  cross-sectional  wage  equation,  measurement  error 
can  be  expected  to  reduce  the  return  to  a  year  of  education  by  about  10  percent  (assuming  there  are  no  other 
covariates).  The  table  also  indicates  that  if  differences  in  educational  attainment  between  pairs  of  twins  or 
siblings  are  used  to  estimate  the  return  to  schooling  (e.g.,  Taubman,  1976;  Behrman,  Hrubec,  Taubman,  and 
Wales  1980;  Ashenfelter  and  Krueger,  1994;  and  Ashenfelter  and  Zimmerman,  1997),  then  the  effect  of 
measurement  error  is  greatly  exacerbated.  This  is  because  schooling  levels  are  highly  correlated  between 
twins,  while  measurement  error  is  magnified  because  reporting  errors  appear  to  be  uncorrelated  between  twins. 
This  situation  is  analogous  to  the  effect  of  measurement  error  in  panel  data  models  discussed  above. 

To  further  explore  the  extent  of  measurement  error  in  labor  data,  we  re-analyzed  the  CPS  data 
originally  used  by  Mellow  and  Sider  (1983).  Figure  6  presents  a  scatter  diagram  of  the  employer-reported  log 
hourly  wage  against  the  employee-reported  log  hourly  wage.60  Although  most  points  cluster  around  the  45 
degree  line,  there  are  clearly  some  outliers.  Some  of  the  large  outliers  probably  result  from  random  coding 
errors,  such  as  a  misplaced  decimal  point. 

Researchers  have  employed  a  variety  of  "trimming"  techniques  to  try  to  minimize  the  effects  of 
observations  that  may  have  been  misreported.  An  interesting  study  of  historical  data  by  Stigler  (1977)  asks 
whether  statistical  methods  that  downweight  outliers  would  have  reduced  the  bias  in  estimates  of  physical 
constants  in  20  early  scientific  data  sets.  These  constants,  such  as  the  speed  of  light  or  parallax  of  the  sun, 
have  since  been  determined  with  certainty.  Of  the  1 1  estimators  that  he  evaluated,  Stigler  found  that  the 
unadjusted  sample  mean,  or  a  10  percent  "winsorized  mean,"  provided  estimates  that  were  closest  to  the 
correct  parameters.  The  10  percent  winsorized  mean  sets  the  values  of  observations  in  the  bottom  or  top  decile 
equal  to  the  value  of  the  observation  at  the  10th  or  90th  percentile,  and  simply  calculates  the  mean  for  this 


"Earnings  in  the  data  analyzed  by  Mellow  and  Sider  were  calculated  in  a  manner  similar  to  that  used  in  the 
redesigned  CPS.  First,  households  and  firms  were  asked  for  the  basis  on  which  the  employee  was  paid,  and  then 
earnings  were  collected  on  that  basis.  Usual  weekly  hours  were  also  collected.  The  household  data  may  have  been 
reported  by  the  worker  or  by  a  proxy  respondent. 
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"adjusted"  sample. 

In  a  similar  vein,  we  used  Mellow  and  Sider's  linked  employer-employee  CPS  data  to  explore  the 
effect  of  various  methods  for  trimming  outliers.  The  analysis  here  is  less  clear  cut  than  in  Stigler's  paper 
because  the  true  values  are  not  known  (i.e.,  we  are  not  sure  the  employer-reported  data  are  the  "true"  data), 
but  we  can  still  compare  the  reliability  of  the  employee  and  employer  reported  data  using  various  trimming 
methods.  The  first  column  of  Table  10  reports  the  difference  in  mean  earnings  between  the  employee  and 
employer  responses  for  the  wage  and  hours  data.  The  differences  are  small  and  statistically  insignificant. 
Column  2  reports  the  correlation  between  the  employee  report  and  the  employer  report,  while  column  3  reports 
the  slope  coefficient  from  a  bivariate  regression  of  the  employer  report  on  the  employee  report.  The  regression 
coefficient  in  column  3  probably  provides  the  most  robust  measure  of  the  reliability  of  the  data.  Columns  4 
and  5  report  the  variances  of  the  employee  and  employer  data.  Results  in  Panel  A  are  based  on  the  full  sample 
without  any  trimming.  Panel  B  presents  results  for  a  1  percent  and  a  10  percent  "winsorized"  sample.  We  also 
report  results  for  a  1  percent  and  10  percent  truncated  sample,  which  drops  from  the  sample  observations  in 
either  tail  of  the  distribution.  Whereas  the  winsorized  sample  rolls  back  extreme  values  (defined  as  the  bottom 
or  top  X  percent)  but  retains  them  in  the  sample,  the  truncated  sample  simply  drops  the  extreme  observations 
from  the  sample.61  In  Panel  B  only  the  employee-reported  data  have  been  trimmed,  because  that  is  all  that 
researchers  typically  observe.  In  Panel  C,  we  trim  both  the  employee  and  employer  reported  data. 

For  hours,  the  unadjusted  data  have  reliability  ratios  around  .80.  Interestingly,  the  reliability  of  the 
hours  data  is  considerably  higher  in  Mellow  and  Sider's  data  than  in  the  PSID  validation  study.  This  may 
result  because  the  PSID  validation  study  was  confined  to  one  plant  (which  restricted  true  hours  variability 
compared  to  the  entire  workforce),  or  because  there  is  a  difference  between  the  reliability  of  log  weekly  hours 
and  annual  hours. 


''Loosely  speaking,  winsorizing  the  data  is  desirable  if  the  extreme  values  are  exaggerated  versions  of  the  true 
values,  but  the  true  values  still  lie  in  the  tails.  Truncating  the  sample  is  more  desirable  if  the  extremes  are  mistakes 
that  bear  no  resemblance  to  the  true  values. 
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The  reliability  ratio  is  lower  for  the  wage  data  than  the  hours  data  in  the  CPS  sample.  For  hours  and 
wages,  the  correlation  coefficients  change  little  when  the  samples  are  adjusted  (either  by  winsorizing  or 
truncating  the  sample),  but  the  slope  coefficients  are  considerably  larger  in  the  adjusted  data  and  exceed  1.0 
in  the  10  percent  winsorized  samples.  When  both  the  employer  and  employee  data  are  trimmed,  the  reliability 
of  the  wage  data  improves  considerably,  while  the  reliability  of  the  hours  data  is  not  much  affected.  These 
results  suggest  that  extreme  wage  values  are  likely  to  be  mistakes.  Overall,  this  brief  exploration  suggests  that 
a  small  amount  of  trimming  could  be  beneficial.  In  a  study  of  the  effect  of  UI  benefits  on  consumption,  Gruber 
(1997)  recommends  winsorizing  the  extreme  1  percent  of  observations  on  the  dependent  variable 
(consumption),  to  reduce  residual  variability.  A  similar  practice  seems  justifiable  for  earnings  as  well. 

The  estimates  in  Table  9  or  10  could  be  used  to  "inflate"  regression  coefficients  for  the  effect  of 
measurement  error  bias,  provided  that  there  are  no  covariates  in  the  equation.  Typically,  however,  regressions 
include  covariates.  Consequently,  in  Table  1 1  we  use  Mellow  and  Sider's  CPS  sample  to  regress  the  employer- 
reported  data  on  the  employee-reported  data  and  several  commonly  used  covariates  (education,  marital  status, 
race,  sex,  experience  and  veteran  status).  For  comparison,  the  first  two  columns  present  the  correlation 
coefficient  and  the  slope  coefficient  from  a  bivariate  regression  of  the  employer  on  the  employee  data.  The 
third  column  reports  the  coefficient  on  the  employee-reported  variable  from  a  multiple  regression  which 
specifies  the  employer-reported  variable  as  the  dependent  variable,  and  the  corresponding  employee-reported 
variable  as  an  explanatory  variable  along  with  other  commonly  used  explanatory  variables;  this  column 
provides  the  appropriate  estimates  of  attenuation  bias  for  a  multiple  regression  which  includes  the  same  set 
of  explanatory  variables  as  included  in  the  table.  Notice  that  the  reliability  of  the  wage  data  falls  from  .77  to 
.66  once  standard  human  capital  controls  are  included.  By  contrast,  the  reliability  of  the  hours  data  is  not  very 
much  affected  by  the  presence  of  control  variables,  because  hours  are  only  weakly  correlated  with  the  controls. 

Table  1 1  also  reports  estimates  of  the  reliability  of  reported  union  coverage  status,  industry  and 
occupation.  Assuming  the  employer-reported  data  are  correct,  the  bivariate  regression  suggests  that  union 
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status  has  a  reliability  ratio  of  .84."  Interestingly,  this  is  unchanged  when  covariates  are  included.  To  convert 
the  industry  and  occupation  dummy  variables  into  a  one-dimensional  variable,  we  assigned  each  industry  and 
occupation  the  wage  premium  associated  with  employment  in  that  sector  based  on  Krueger  and  Summers 
(1987).  The  occupation  data  seem  especially  noisy,  with  an  estimated  reliability  ratio  of  .75  conditional  on 
the  covariates. 

Earlier  we  mentioned  that  classical  measurement  error  has  a  greater  effect  if  variables  are  expressed 
as  changes.  Although  we  cannot  examine  longitudinal  changes  with  Mellow  and  Sider's  data,  a  dramatic 
illustration  of  the  effect  of  measurement  error  on  industry  and  occupation  changes  is  provided  by  the  1 994  CPS 
redesign.  The  redesigned  CPS  prompts  respondents  who  were  interviewed  the  previous  month  with  the  name 
of  the  employer  that  they  reported  working  for  the  previous  month,  and  then  asks  whether  they  still  work  for 
that  employer.  If  respondents  answer  "no,"  they  are  asked  an  independent  set  of  industry  and  occupation 
questions.  If  they  answer  "yes,"  they  are  asked  if  the  usual  activities  and  duties  on  their  job  changed  since  last 
month.  If  they  report  that  their  activities  and  duties  were  unchanged,  they  are  then  asked  to  verify  the  previous 
month's  description  of  their  occupation  and  activities.  Lastly,  if  they  answer  that  their  activities  and  duties 
changed,  they  are  asked  an  independent  set  of  questions  on  occupation,  activities,  and  class  of  worker.  Based 
on  pre-tests  of  the  redesigned  CPS  in  1991 ,  Rothgeb  and  Cohany  (1992)  find  that  the  proportion  of  workers 
who  appear  to  change  three-digit  occupations  from  one  month  to  the  next  falls  from  39  percent  in  the  old 
version  of  the  CPS  to  7  percent  in  the  redesigned  version.63  The  proportion  who  change  three-digit  industry 


"The  most  likely  incorrect  assumption  that  the  employer  union  data  are  correct  is  made  because  union  status  is  a 
dummy  variable,  so  measurement  errors  will  be  correlated  with  true  union  status.  If  union  status  is  correctly  reported 
by  employers,  the  regression  coefficient  nonetheless  provides  a  consistent  estimate  of  the  attenuation  bias. 
Additionally,  note  that  the  reliability  of  data  on  union  status  depends  on  the  true  fraction  of  workers  who  are  covered 
by  a  union  contract.  Since  union  coverage  as  a  fraction  of  the  workforce  has  declined  over  time,  the  reliability  ratio 
might  be  even  lower  today.  As  an  extreme  example,  note  that  even  if  the  true  union  coverage  rate  falls  to  zero,  the 
measured  rate  will  exceed  zero  because  some  (probably  around  3  percent)  nonunion  workers  will  be  erroneously 
classified  as  covered  by  a  union.  See  Freeman  (1984),  Jakubson  (1986)  and  Card  (1996)  for  analyses  of  the  effect  of 
measurement  error  in  union  status  in  longitudinal  data. 

63It  is  also  possible  that  dependent  interviewing  reduces  occupational  changes  because  some  respondents  find  it 
easier  to  complete  the  interview  by  reporting  that  they  did  not  change  employers  even  if  they  did.  Although  this  is 
possible,  Rothgeb  and  Cohany  point  out  that  asking  independent  occupation  and  industry  questions  of  individuals 
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between  adjacent  months  falls  from  23  percent  to  5  percent.  These  large  changes  in  the  gross  industry  and 
occupation  flows  obviously  change  one's  impression  of  the  labor  market.64 

4.3  Weighting  and  Allocated  Values 

Many  data  sets  use  complicated  sampling  designs  and  come  with  sampling  weights  that  reflect  the 
design.  Researchers  are  often  confronted  with  the  question  of  whether  to  employ  sample  weights  in  their 
statistical  analyses  to  adjust  for  nonrandom  sampling.  For  example,  if  the  sampling  design  uses  stratified 
sampling  by  state,  with  smaller  states  sampled  at  a  higher  rate  than  larger  states,  then  observations  from  small 
states  should  get  less  weight  if  national  statistics  are  to  be  representative.  In  addition  to  providing  sample 
weights  for  this  purpose,  the  Census  Bureau  also  "allocates"  answers  for  individuals  who  do  not  respond  to 
a  question  in  one  of  their  surveys.  Missing  data  are  allocated  by  inserting  information  for  a  randomly  chosen 
person  who  is  matched  to  the  person  with  missing  data  on  the  basis  of  major  demographic  characteristics. 
Consequently,  there  are  no  "missing  values"  on  Census  Bureau  micro  data  files.  But  researchers  may  decide 
to  include  or  exclude  observations  with  allocated  responses  since  information  that  has  been  allocated  is 
identified  with  "allocation  flags."  Unfortunately,  although  there  is  a  large  literature  on  weighting  and  survey 
nonresponse,  this  literature  has  not  produced  any  easy  answers  that  apply  to  all  data  sets  and  research  questions 
(see,  for  example,  Rubin,  1983;  Dickens,  1985;  Lillard,  Smith  and  Welch,  1986;  Deaton,  1995,  1997;  or 
Groves,  1998).65 

Two  data  sets  where  both  weighting  and  allocation  issues  come  up  are  the  CPS  and  the  1990  Census 
Public  Use  Micro  Sample  (PUMS),  neither  of  which  is  a  simple  random  sample.  The  CPS  uses  a  complicated 


who  report  changing  employers  could  result  in  spurious  industry  and  occupation  changes.  In  addition,  the  large 
number  of  mismatches  between  employer  and  employee  reported  occupation  and  industry  data  in  Mellow  and 
Sider's  data  set  if  consistent  with  a  finding  of  grossly  overestimated  gross  industry  and  occupation  flows. 

"See  also  Poterba  and  Summers  (1986),  who  estimate  the  measurement  error  in  employment-status  transitions. 

"But  see  DuMouchel  and  Duncan  (1983),  who  note  that  if  the  object  of  regression  is  a  MMSE  linear  approximation 
to  the  CEF  then  estimates  from  non-random  samples  should  be  weighted. 
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multi-stage  probability  sample  that  over-samples  some  states,  and  recently  oversamples  Hispanics  in  the  March 
survey  (see,  e.g.,  Bureau  of  the  Census  1992).  The  1990  PUMS  also  deviates  from  random  sampling  because 
of  over-sampling  of  small  areas  and  Native  Americans  (Bureau  of  the  Census,  1996).sample.66  And  even 
random  samples  may  fail  to  be  representative  by  chance,  or  because  some  sampled  households  are  not  actually 
interviewed.  The  sampling  weights  including  with  CPS  and  PUMS  micro  data  are  meant  to  correct  for  features 
of  the  sample  design,  as  well  as  deviations  from  random  sampling  due  to  chance  or  nonresponse  that  affect 
the  age,  Sex,  Hispanic  origin,  or  race  make-up  of  the  sample.  Missing  data  for  respondents  in  these  data  sets 
are  also  allocated.  And  in  the  CPS,  if  someone  fails  to  answer  a  monthly  supplement  (e.g.,  the  March  income 
supplement),  then  entire  record  is  allocated  by  drawing  a  randomly  matched  "donor  record"  from  someone  who 
did  respond. 

To  assess  the  consequences  of  weighting  and  allocation  for  one  important  area  of  research,  we 
estimated  a  standard  human  capital  earnings  function  with  data  from  the  1990  March  CPS  and  1990  5  percent 
PUMS  for  the  four  permutations  of  weighting  or  not  weighting,  and  including  or  excluding  observations  with 
allocated  responses.  The  samples  consist  of  white  and  black  men  age  40  to  49  with  at  least  8  years  of 
education.67  Regression  results  and  mean  log  weekly  earnings  are  summarized  in  Table  12.  In  both  data  sets, 
the  estimated  regression  coefficients  are  remarkably  similar  regardless  of  whether  the  equation  is  estimated 
by  OLS  or  weighted  least  squares  to  adjust  for  sample  weights,  and  regardless  of  whether  the  observations  with 
allocated  values  are  excluded  or  included  in  the  sample.  Moreover,  except  for  potential  experience,  the 
regression  coefficients  are  quite  similar  if  they  are  estimated  with  either  the  Census  or  CPS  sample.  One 
notable  difference  between  the  data  sets,  however,  is  that  mean  log  earnings  are  about  6  points  higher  in  the 
Census  than  the  CPS  for  this  age  group. 


^he  1980  PUMS  are  simple  random  samples.  The  CPS  was  stratified  but  self-weighting  (i.e.,  all  observations  were 
equally  likely  to  be  sampled)  until  January  1978. 

67In  addition,  to  make  the  samples  comparable,  the  Census  sample  excludes  men  who  were  on  active  duty  in  the 
military,  and  the  CPS  sample  excludes  the  Hispanic  oversample  and  the  men  in  the  armed  forces.  The  education 
variable  in  both  data  sets  was  converted  to  linear  years  of  schooling  based  on  highest  degree  attained. 


84 
The  results  in  Table  12  suggest  that  estimates  of  a  human  capital  earnings  function  using  CPS  and 
Census  data  are  remarkably  robust  to  whether  or  not  the  sample  is  weighted  to  account  for  the  sample  design, 
and  whether  or  not  observations  with  allocated  values  are  included  in  the  sample.  At  least  for  this  application, 
nonrandom  sampling  and  the  allocation  of  missing  values  are  not  very  important.68  It  should  be  noted, 
however,  that  Census  Bureau  surveys  analyzed  here  are  relatively  close  to  random  samples,  and  that  the  sample 
strata  involve  covariates  that  are  included  in  the  regression  models.  Some  of  the  data  sets  discussed  earlier, 
most  notably  the  NLSY  and  the  PSID,  include  large  non-random  sub-samples  that  more  extensively  select  or 
over-sample  certain  groups  using  a  wider  range  of  characteristics,  including  racial  minorities,  low-income 
respondents,  or  military  personnel.  When  working  with  these  data  is  it  important  to  check  whether  the  use  of 
a  non-representative  sample  affects  empirical  results.  Moreover,  since  researchers  often  compare  results  across 
samples,  weighting  may  be  desirable  if  this  helps  reduce  the  likelihood  that  differences  in  sample  design 
generate  different  results. 

5.  Summary 

This  chapter  attempts  to  provide  an  overview  of  the  empirical  strategies  used  in  modern  labor 
economics.  The  first  step  is  to  specify  a  causal  question,  which  we  think  of  as  comparing  actual  and 
counterfactual  states.  The  next  step  is  to  devise  a  strategy  that  can,  in  principle,  answer  the  question.  A  critical 
issue  in  this  context  is  how  the  causal  effect  of  interest  is  identified  by  the  statistical  analysis.  In  particular, 
why  does  the  explanatory  variable  of  interest  vary  when  other  variables  are  held  constant?  Who  is  implicitly 
being  compared  to  whom?  Does  the  source  of  variation  used  to  identify  the  key  parameters  provide  plausible 
"counterfactuals"?  And  can  the  identification  strategy  be  tested  in  a  situation  in  which  the  causal  variable  is 
not  expected  to  have  an  effect?  Finally,  implementation  of  the  empirical  strategy  requires  appropriate  data, 


680f  course,  the  standard  errors  of  the  estimates  should  reflect  the  sample  design  and  account  for  changes  in 
variability  due  to  allocation.  But  for  samples  of  this  sire,  the  standard  errors  are  extraordinarily  small,  so  adjusting 
them  for  these  features  of  the  data  is  probably  of  second-order  importance. 
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and  careful  attention  to  the  many  measurement  problems  that  are  likely  to  arise  along  the  way. 
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Appendix 

A.  Derivation  of  equation  (9)  in  the  text 

The  model  is 

Yi=p0  +  pSi  +  Tli,E[SiT1i]=0 
Ai  =  Yo  +  YiSi  +  Tl,i,E[SiTili]=0 

The  coefficient  on  Sj  in  a  regression  of  Y;  on  S;  and  A<  is  C(Yl,S.Aj)/V(S.Aj)  where 

S^SrTio-Tt.Ai,  and  Ti^Y.VCSJ/VCAi). 

Also, 

V(S.Ai)  =  V(Si)-Tt12V(Ai)=[V(Si)A^(Ai)][V(Ai)-Yl2V(S,)]=[V(S1)A'(A,)]V(rili). 

So, 

C(Yi,S.Ai)A^(S.Ai)=p+C(iii,  SrTto-n^fWiS.^)  =p-7i,C(r|i,  Ai)A^(S.Ai)=p-7i1C(n„  ilO/VCS.J 

=  P  -  Yi4>oi- 


B.  Derivation  of  equation  (34)  in  the  text 

To  economize  on  notation,  we  use  E[YI  X,  j]  as  shorthand  for  E[YJ  X;,  S-j].  Repeating  equation  (3 1 )  in  the 
text  without  "i"  subscripts: 

(A.  1 )  pr  =  E[Y(S-E[SI  X])]/E[S(S-E[SI  X])]  =  E[E(YI  S,  X)(S-E[SI  X])]/E[S(S-E[SI  X])] 

Now  write 

S  S 

(A.2)  E[YI  X,  S]  =  E[YI  X,  0]  +  X(E[YI  X,  j]  -  E[YI  X,  j-1] }  =  E[YI  X,  S=0]  +  IPjl, 

j=l  j=l 

where 

Pj^E[YIX,j]-E[YIX,j-l] 

We  first  simplify  the  numerator  of  pr.  Substituting  A.2  into  A.l: 

S 
E[E(YI  X,  S)(S-E[SI  X])]  =  E{(£pjx)(S-E[SI  X])} 

j=l 

S 

=  E{E[Ipjx(S-E[SIX])IX]} 

j=l 
Working  with  the  inner  expectation: 

S  s     s 

E[EPj,(S-E[SI  X])l  X]  =  I    I  Pj^-E[SI  X])P„ 
j=l  s=l  j=l 

where 

P„  =  P(S=*IX). 


Reversing  the  order  of  summation,  this  equals 

s         s  s 

XpjJl(J-EtSIX])PJ  =  IpjlMjl 
j=l     s=j  j=l 

where 

s 
Mjl  =  I(^-E[SIX])P„ 
s=j 
Now,  simplifying, 

s  s 

Mjx  =  E  sPn  -  £E[SI  X]P„  =  (E[SI  X,  Sij]-E[Sl  X])P(S*jl  X). 

Since 

E[SI  X]  =  E[SI  X,  S*j]P(S*jl  X)  +  E[SI  X,  S<j](l-P(S*jl  X)), 

ujx  =  (E[SI  S*j,  X]-E[SI  S<j,  X])P(S*jl  X)(l-P(S*jl  X)). 

So  we  have  shown 

s 
E[Y(S-E[SIX])]  =  EEpjjijJ. 
j=l 

A  similar  argument  for  the  denominator  shows 

s 
E[S(S-E[SIX])]  =  E[£njx]. 
j=l 

Substitute  S  for  j  to  get  equation  (34)  using  the  notation  in  the  text. 
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C.  Schooling  in  the  1990  Census 

Years  of  schooling  was  coded  from  the  1990  Census  categorical  schooling  variables  as  follows: 

Educational  attainment 

5*.  6th,  7th,  or  8th  grade 

9th  grade 

10th  grade 

1 1"1  grade  or  12th  grade,  no  diploma 

High  school  graduate,  diploma  or  GED 

Some  college,  but  no  degree 

Completed  associate  degree  in  college,  occupational  program 

Completed  associate  degree  in  college,  academic  program 

Completed  bachelor's  degree,  not  attending  school 

Completed  bachelor's  degree,  but  now  enrolled 

Completed  master's  degree 

Completed  professional  degree 

Completed  doctorate 
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Labor  Economics  Articles 

All  Fields 

1965-69 

1970-74 

1975-79 

1980-83 

1994-97 

1994-97 

Theory  Only 

14 

19 

23 

29 

21 

44 

Micro  data 

11 

27 

45 

46 

66 

28 

Panel 

1 

6 

21 

18 

31 

12 

Experiment 

0 

0 

2 

2 

2 

3 

Cross-Sectior 

i 

10 

21 

21 

26 

25 

9 

Micro  data  set 

PSID 

0 

0 

6 

7 

7 

2 

NLS 

0 

3 

10 

6 

11 

2 

CPS 

0 

1 

5 

6 

8 

2 

SEO 

0 

4 

4 

0 

1 

0 

Census 

3 

5 

2 

0 

5 

1 

All  other  micro  data  sets 

8 

14 

18 

27 

38 

21 

Time  Series 

42 

27 

18 

16 

6 

19 

Census  Tract 

3 

2 

4 

3 

0 

0 

State 

7 

6 

3 

3 

2 

2 

Other  aggregate 

cross-section 

14 

16 

8 

4 

6 

6 

Secondary  Data 

Analysis 

14 

3 

3 

4 

2 

2 

Total  Number  of  Articles 

106 

191 

257 

205 

197 

993 

Notes:  Figures  for  1965-83  are  from  Stafford  (1986).  Figures  for  1994-97  are  based  on  authors' analysis, 
and  pertain  to  the  first  half  of  1997.  Following  Stafford,  articles  are  drawn  from  8  leading  economics  journals. 
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Table  4 


Differences-in-Differences  Estimates  of  the  Effect  of  Immigration 

on  Unemployment 

Year 

1979 

1981 

1981-79 

Group 

(1) 

(2) 

(3) 

Whites 

(1) 

Miami 

5.1 

3.9 

-1.2 

(1.1) 

(-9) 

(1.4) 

(2) 

Comparison  Cities 

4.4 

4.3 

-.1 

(.3) 

(.3) 

(.4) 

(3) 

Difference 

.7 

-.4 

-1.1 

Miami-Comparison 

(1.1) 

(.95) 

(1.5) 

Blacks 

(4) 

Miami 

8.3 

9.6 

1.3 

(1.7) 

(1.8) 

(2.5) 

(5) 

Comparison  Cities 

10.3 

12.6 

2.3 

(.8) 

(.9) 

(1.2) 

(6) 

Difference 

-2.0 

-3.0 

-1.0 

Miami-Comparison 

(1.9) 

(2.0) 

(2.8) 

Notes:  Adapted  from  Card  (1990),  Tables  3  and  6.  Standard  errors  are  shown  in  parentheses. 
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Table  5 


IV  Estimates  of  the  Effects  of  Military  Service 

on  White  Men 

Earnings 

Veteran  Status 

Wald 
Estimate  of 

Earnings 
year 

Mean              Eligibility 
Effect 

Mean 

Eligibility 
Effect 

Veteran 
Effect 

(1)                     (2) 

(3) 

(4) 

(5) 

A.  Men  bom  1950 

1981 

16,461                -435.8 
(40.5) 

.267 

.159 
(.040) 

-2,741 
(1,324) 

1970 

2,758                -233.8 
(39.7) 

-1,470 
(250) 

1969 

2,299                   -2.0 
(34.5) 

B.  Men  born  1951 

1981 

16,049               -358.3 
(203.6) 

.197 
(.013) 

.136 
(.043) 

-2,635 
(1,497) 

1971 

2,947                -298.2 
(41.7) 

-2,193 
(307) 

1970 

2,379                  -44.8 
(36.7) 

C.  Men  born  1953  (no  one  drafted) 

1981 

14,762                  34.3 
(199.0) 

.130 

.043 
(.037) 

no  first 
stage 

1972 

3,989                 -56.5 
(54.8) 

1971 

2,803                    2.1 
(42.9) 

Note:  Adapted  from  Tables  2  and  3  in  Angrist  (1990),  and  unpublished  author  tabulations.  Standard 
errors  are  shown  in  parentheses.  Earnings  data  are  from  Social  Security  adminstrative  records.   Figures 
are  in  nominal  dollars.  Veteran  status  data  are  from  the  Survey  of  Program  Participation.  There  are  about 
13,500  observations  with  earnings  in  each  cohort. 
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Table  6 
Matching  and  regression  estimates  of  the  effects  of  voluntary  military  service 


Race 

Average 

Differences  in 

Matching 

Regression 

Regression 

earnings  in 

means  by 

estimate  of 

estimate  of 

minus 

1988-1991 

veteran  status 

veteran  effect 

veteran  effect 

Matching 

0) 

(2) 

(3) 

(4) 

(5) 

Whites 

14,537 

1,233.4 

-197.2 

-88.8 

108.4 

(60.3) 

(70.5) 

(62.5) 

(28.5) 

Nonwhites 

11,664 

2,449.1 

839.7 

1,074.4 

234.7 

(47.4) 

(62.7) 

(50.7) 

(32.5) 

Notes:  Adapted  from  Tables  II  and  V  in  Angrist  (1998).  Standard  errors  are  reported  in  parentheses.  The 
tables  shows  estimates  of  the  effect  of  voluntary  military  service  on  the  1988-91  Social  Security-taxabale 
earnings  men  who  applied  to  enter  the  armed  forces  between  from  1979-82.  The  matching  and  regression 
estimates  control  for  applicants  year  of  birth,  education  at  the  time  of  application,  and  AFQT  score.  There  are 
128,968  whites  and  175,262  nonwhites  in  the  sample. 
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ample 

Mean  Employee 
Minus  Employer 

r 

3 

Employee 
Variance 

Employer 
Variance 

A.  Unadjusted  Data 
In  wage 
In  hours 

0.017 
-0.043 

0.65 
0.78 

0.77 
0.87 

0.355 
0.195 

0.430 
0.182 

B.  Employee  Data  Winsorized  or  Truncated 
1%  Winsorized  Sample 

In  wage  0.021 

In  hours  -0.044 


10%  Winsorized  Sample 
In  wage 
In  hours 

1%  Truncated  Sample 
In  wage 
In  hours 

10%  Truncated  Sample 
In  wage 
In  hours 


0.034 
-0.069 


0.023 
-0.041 


0.021 
-0.030 


0.68 
0.77 


0.68 
0.72 


0.68 
0.75 


0.60 
0.62 


0.88 

0.278 

0.430 

0.91 

0.164 

0.182 

1.04 

0.188 

0.430 

1.28 

0.064 

0.182 

0.91 

0.243 

0.413 

0.87 

0.134 

0.155 

0.94 

0.126 

0.307 

0.96 

0.033 

0.072 

C.  Both  Employee  and  Employer  Data  Winsorized  or  Truncated 
1%  Winsorized  Sample 

In  wage  0.025  0.8 

In  hours  -0.04  0.78 


10%  Winsorized  Sample 
In  wage 
In  hours 

1  %  Truncated  Sample 
In  wage 
In  hours 

10%  Truncated  Sample 
In  wage 
In  hours 


0.028 
-0.024 


0.032 
-0.036 


0.024 
-0.012 


0.88 
0.84 


0.88 
0.76 


0.91 
0.8 


0.86 

0.278 

0.305 

0.85 

0.164 

0.155 

0.92 

0.188 

0.199 

0.85 

0.064 

0.059 

0.92 

0.230 

0.250 

0.81 

0.109 

0.125 

0.94 

0.119 

0.125 

0.83 

0.027 

0.028 

(oO 

Notes  to  Table  10:  r  is  the  correlation  coefficient  between  the  employee-  and  employer-reported  values. 
P  is  the  slope  coefficient  from  a  regression  of  the  employer-reported  value  on  the  employee-reported 
value.  Sample  size  is  3,856  for  unadjusted  wage  data  and  3,974  for  unadjusted  hours  data. 
In  the  1%  winsorized  sample,  the  bottom  and  top  1%  of  observtions  were  rolled  back  to  the  value 
corresponding  to  the  1st  or  99th  percentile  cutoff;  in  the  truncated  sample  these  observations  were 
deleted  from  the  sample. 


0.65 

0.77 

0.66 

0.68 

0.91 

0.85 

0.68 

0.88 

0.79 

0.78 

0.87 

0.86 

0.75 

0.87 

0.85 

0.77 

0.91 

0.90 

IO\ 


Table  1 1 
Estimates  of  Reliability  Ratios  from  Mellow  and  Sider's  CPS  Data  Set 

Bivariate         Multivariate 
Variable ■ r R R 

In  wage  unadjusted 
In  wage  1%  truncated* 
In  wage  1  %  winsorized* 

In  hours  unadjusted 
In  hours  1  %  truncated* 
In  hours  1  %  winsorized* 

union  0.84  0.84  0.84 

2-digit  industry  premium  0.93  0.93  0.92 

1 -digit  industry  premium  0.91  0.92  0.90 

1 -digit  occupation  premium  0.84  0.84  0.75 


Notes:  r  is  the  correlation  coefficient  between  the  employee-  and  employer-reported 
values.  P  is  the  coefficient  from  a  regression  of  the  employer-reported  value  on  the 
employee-reported  value.  Int  the  multiple  regression,  covariates  include:  highest 
grade  of  school  compoleted,  high  school  diploma;  college  diploma  dummy,  marrital 
status,  nonwhite,  female,  potential  work  experience,  potential  work  experience 
squared,  and  veteran  status.  Sample  size  varies  from  3,806  (for  industry)  to  4,  087 
(for  occupation). 

*  Only  the  employee  data  were  truncated  or  winsorized. 
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Table  12 
Weighting  and  allocation  in  the  Census  and  CPS 


Covariate 


1990  Census 


March  1990  CPS 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

(8) 

Log  wages  mean 

6.405 

6.415 

6.425 

6.437 

6.340 

6.348 

6.351 

6.357 

.746 

.747 

(.723) 

.721 

.732 

.734 

.717 

.723 

Education 

.10932 

.10828 

.10920 

.10813 

.10839 

.11139 

.10950 

.11314 

(.00047) 

(.00047) 

(.00049) 

(.00049) 

(.00442) 

(.00438) 

(.00459) 

(.00459) 

White 

.208 

.213 

.199 

.202 

.194 

.219 

.196 

.211 

(.003) 

(.003) 

(.004) 

(.003) 

(.030) 

(.027) 

(.031) 

(.029) 

Married 

.386 

.387 

.381 

.382 

.386 

.387 

.343 

.362 

(.004) 

(.003) 

(.004) 

(.004) 

(.031) 

(.029) 

(.032) 

(.031) 

Widowed 

.181 

.165 

.190 

.171 

.110 

.200 

.077 

.075 

(.013) 

(.013) 

(.014) 

(.014) 

(.108) 

(.105) 

(.117) 

(.115) 

Divorced  or 

.193 

.187 

.202 

.196 

.167 

.135 

.141 

.123 

separated 

(.004) 

(.004) 

(.005) 

(.004) 

(.037) 

(.035) 

(.039) 

(.037) 

Hispanic 

-.142 

-.151 

-.138 

-.145 

-.125 

-.179 

-.107 

-.155 

(.005) 

(.005) 

(.005) 

(.005) 

(.040) 

(.048) 

(.041) 

(.049) 

Veteran 

-.012 

-.014 

-.018 

-.021 

-.0001 

-.012 

-.002 

-.015 

(.002) 

(.002) 

(.002) 

(.002) 

(.016) 

(.016) 

(.017) 

(.017) 

Potential 

.040 

.041 

.041 

.042 

.0005 

-.002 

.013 

.013 

experience 

(.002) 

(.002) 

(.002) 

(.002) 

(.021) 

(.022) 

(.022) 

(.023) 

Pot.  experience 

-.055 

-.055 

-.057 

-.057 

.024 

.035 

.003 

.008 

squared*  100 

(.004) 

(.004) 

(.005) 

(.005) 

(.043) 

(.043) 

(.045) 

(.045) 

Allocated 

yes 

yes 

no 

no 

yes 

yes 

no 

no 

Weighted 

no 

yes 

no 

yes 

no 

yes 

no 

yes 

N 

603,763 

603,731 

527,095 

527,071 

7,134 

7,134 

6,361 

6,361 

Notes:  The  table  reports  OLS  estimates  of  wage  equations  with  the  indicated  covariates.  The  samples  include 
black  and  white  men  aged  40-49  with  at  least  8  years  of  schooling.  The  Census  sample  excludes  active-duty 
military  personnel  and  the  CPS  sample  excludes  military  personnel  and  the  hispanic  over-sample.  The  CPS 
schooling  variable  is  highest  year  completed  while  the  census  variable  is  imputed  as  described  in  the  appendix. 
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A.  Average  Class  Size  and  Predicted  Class  Size 
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B.  Average  Reading  Scores  and  Average  Predicted  Class  Size 
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C.  Regression-Adjusted  Reading  Scores  and  Predicted  Class  Size 
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Figure  2.  Illustration  of  regression-discontinuity  method  for  estimating  the  effect  of  class  size  on 
pupils'  test  scores.  Data  are  from  Angrist  and  Lavy  (1998). 
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A.  Conditional  expectation  function  and  OLS  regression  line 
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B.  Schooling  histogram  and  OLS  weighting  function 
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Figure  3.  Panel  A  shows  the  conditional  expectation  function  (CEF)  of  log  weekly  earnings  given  schooling, 
adjusted  for  covariates  as  described  in  the  text.  Also  plotted  is  the  average  change  in  the  CEF  and  the  OLS 
regression  line.  Panel  B  shows  the  schooling  histogram  and  OLS  weighting  function.  Data  are  for  men  aged 
40-49  in  the  1990  Census. 
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Figure  4.    Effects  of  voluntary  military  service  on  earnings  in  1988—91,  plotted  by  race  and  probability  of  service, 
conditional  on  covariates.    The  earnings  data  are  from  Social  Security  administrative  records. 
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Figure  5.  First  quarter— fourth  quarter  difference  in  schooling  CDFs,  for  men  born  1930—39  in  the  1980  Census. 
The  dotted  lines  are  95%  confidence  intervals. 
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